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Koch has argued that psychology is an imitation science, because it has failed to build an edifice of positive 
knowledge; and thai it cannot logically do any better in the future. 

The paper rejects this sceptical argument. The sceptics appear to be victims of the accumulation-cum- 
building picture of scientific progress, and prisoners of the mistaken presupposition that progress in science 
consists in either the achievement of a paradigm, or the subsequent development of one. The paper points to 
weaknesses in the thesis that it is not logically possible for psychology to do any better in the future and 
achieve a paradigmatic advance. 

But though the sceptics’ case against psychology 1s a bad one, their case against a traditional view about 
the nature of psychology is sound; and this suggests that we should think in a different way about the 
subject 


I 


Koch (1974) has argued that psychology is an ‘imitation science’. By this statement he appears 
to mean that what psychologists have produced is a subject which ‘imitates’, in its methods and 
results, a genuine science (such as physics or chemistry); but which is really so dissimilar from 
the genuine sciences that it fails to be a science at all, and, therefore, merits the description 
‘imitation science’. 

Koch summarizes his reasons for this argument in the following way: 


The idea that psychology ~ like the natural sciences on which it ıs modelled — is a cumulative or progressive 
discipline is hardly borne out by its history. Indeed, there could be a way of writing the history of modern 
psychology which would have to acknowledge that most of the well verified and solid ‘advances’ of any 
generality are registered by clusters of findings that help reveal the utter inadequacy of long-flourishing 
analytical frameworks or so-called. ‘theories’. The hard knowledge that accrues in one generation typically 
disenfranchises the theoretical fictions of the last - and any new theoretical framework it is believed to 
suggest, or support, typically survives only until the next If psychology be science, it is ‘science’ of a 
strange kind. Its larger generalizations ere not specified and refined over time and effort; they are merely 
replaced. Throughout its history as ‘science’, the hard knowledge which it has deposited has usually been 
negative knowledge! 


This is an argument from the history of the subject. Psychology has failed to produce the 
steady accumulation of knowledge that characterizes progress in the natural sciences. No doubt, 
psychologists have discovered a great many particular hard bricks. But they have not been able 
to use these bricks to build an edifice of hard knowledge. In so far as there has been a growth or 
accumulation of knowledge, this has been negative in character. Psychologists have discovered 
what is not the case; they have failed to discover what really is the truth about human nature and 
its functioning. 

But Koch goes further than this. It is not just that psychologists have failed to produce an 
edifice of hard knowledge in the past. They will not do any better in the future. They will not do 
so because they cannot; and they cannot because the methods of natural science they use and 
espouse cannot logically be extended to deal with psychological phenomena. 

Well, why not? Koch distinguishes between two types of ‘subfields of psychology’. On the 
one hand, there is ‘sensory psychology’ and ‘biological psychology’. On the other hand, there 
are subfields such as ‘perception, cognition, motivation and learning, social psychology, 
psychopathology and personology’. The former (i.e. sensory and biological psychology), he says, 
‘might just as well (and perhaps more fruitfully) be regarded as parts of a biological science’. 


- The latter type, he says, are ‘close to the heart of psychological studies’; and it is with these that 
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he is really concerned. He maintains that scientific methods cannot logically be extended to 
these fields because 


in all these areas such concepts as ‘law’, ‘experiment’, ‘measurement’, ‘variable’, ‘control’, ‘theory’, do 
not behave sufficiently like their homonyms in the established sciences to justify the extension to them of the 
term ‘science’. To persist in the use of this highly charged metaphor is to shackle these fields of study with 
exceedingly unrealistic expectations concerning generality limits of the anticipated findings, predictive 
specificity 


and so on. In other words, Koch is claiming that the subject matter of study in these areas of 
psychology is so different from the subject matter in the netural sciences that it is useless to try 
to apply scientific methods to them. If we do, and we speak of ‘psychological experiment’, 
‘variables’, and the rest, we are just deluding ourselves. These words and expressions have no 
genuine application in the psychological field. If we go on using them, the effect is to produce 
‘spurious knowledge’ and an ‘imitation science’. 

Now these two arguments — from history and from the logic of the subject matter of 
psychology — appear to represent the gist of Koch’s criticism of contemporary psychology. In 
presenting this criticism Koch does not stand alone. He is only one of several students of the 
subject who, in recent years, have presented a sceptical view of the status and prospects of the 
subject — of the discipline that is taught in British and American universities. I have chosen to 
concentrate on Koch’s criticisms, because he is perhaps tke most vociferous and well known of 
the sceptics. Obviously, if his sceptical arguments are correct, if psychology really is an 
imitation science and bound to remain one, then the stand taken by psychology departments 
throughout this country and the USA collapses at once; and they will all be obliged to undertake 
a major task of rethinking and reorganization. 


H 


How sound is this criticism of contemporary psychology? I begin with the argumer:t from 
history. Has the history of psychology been a disaster -- a negative achievement - from which no 
edifice of knowledge has emerged, and therefore, a history in which no progress has been 
attained? 

Let us examine this argument by looking, very briefly, at a piece of psychological history 
which Koch accepts as illustrating splendidly the truth of his negative thesis about the history of 
the whole subject. Let us look at the history of the work on learning and learning theory. As his 
thesis is itself a historical one, we must cease being psychologists and must put on the spectacles 
of the historian. Accordingly, we have to ask at this point, pace Collingwood, ‘what was the 
question that psychologists of learning were asking themselves at the beginning of the period — 
say, at the time of Watson and just after World War I?’ (Watson 1914, 1924). 

We have to remember that psychologists were then corning to feel the impact of two critical 
developments in related fields. In biology the work of Darwin, and his successors, drew 
attention to the instinctive equipment of animals and thei- comparative powers. In physiology, 
everyone was still assimilating the work of Sherrington and others, and the concept of the reflex. 
The latter was taken to provide the key to the working of the nervous system; and this was 
regarded as a throughput system, which enabled the organism to make an adaptive response, 
under the influence of its innate equipment and its past experience in the world. So the general 
question psychologists of learning asked was this: how does the experience of the organism 
enable it to use its innate equipment to acquire a repertoire of adaptive responses? As we all 
know, in trying to answer this question, students of learning were naturally and reasonably 
affected by the original work of Pavlov - theoretically and methodologically. Consequently, the 
whole S-R programme was itself a natural and appropriate response to the problems posed for 
learning psychologists at this time. 

They then began to uncover more and more of the complexities of acquired adaptive 
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behaviour; and this whole field was explored objectively and fairly carefully in a way that it had 
not been explored before. As a result, we came to appreciate all sorts of things we did not know 
before; and the texts of learning over this period reveal fairly clearly what these things were. 

However, this work had barely got under way when two discoveries were made about brain 
functioning, which altered the whole background that had generated the work on learning in the 
first place. (1) By the use of experimental methods, Lashley took the first steps to show that the 
brain and cortex could not be regarded as a system built like a telephone exchange. (2) Adrian 
and Matthews confirmed the Berger rhythm, thereby suggesting that the brain was a centre of 
spontaneous activity of its own. These developments in physiological psychology fed into the 
already changing emphasis among psychologists of learning. The latter had been moving from an 
exclusive concern with the environmental input to an emphasis on the internal states of the 
organism. This happened largely through the influence of Tolman and Clark Hull. As a result, 
psychologists came to develop structural-cum-functional accounts of how organisms work. In the 
years after World War II, we came to appreciate both the strength and the limits of this way of 
proceeding at this time. It became clear, also, that the earlier workers had missed phenomena 
such as reversal learning; that it was very difficult to decide between one and two process 
learning theories; that the role of selective attention was difficult to pin down; and that, generally 
speaking, the subtleties of organismic reactivity were very much greater than had been supposed 
during the decades of grand theory building in learning. 

Now, with this crude historical reminder behind me, let us return to the sceptics’ criticism of 
psychology. Is it adequate to describe all this work on learning as a history of disastrous 
retreat? — as a negative achievement? I believe that the judgement of the historian of science will 
be: certainly not. This work on learning represents a very considerable positive achievement. In 
it psychologists have opened up and carefully explored an area that is essential for the 
understanding of how organisms function, whether animal or human. This work shows us that 
they are very much wiser men today than they were in Watson’s time. If some of them are 
landed on Mars with the first astronauts, they will be able to study the strange species of man 
they find there with a subtlety and sophistication that they would not have possessed in 1914 or 
1920. For they are in possession today of concepts and skills that they did not have 50 years ago. 
In this sense, they are standing on the shoulders of their predecessors. This means that in some 
sense there has been a significant development of knowledge in this field - which represents an 
achievement by psychologists that it is just silly to denigrate. 

I happen to have picked on work in learning as an example. But I think I could have picked 
on other examples of psychological work, which would reveal the same sort of picture as learning 
does. Indeed, the quickest way to see the limitations of the sceptics’ negative thesis about 
progress in psychology is to keep our historical spectacles firmly on our noses, and then simply 
to compare and contrast the textbooks of psychology of (say) 50 or 40 years ago with those in 
current use, 


m 


Can the sceptics reply to this argument? Yes, I think they may be tempted to reply that, if all 
psychology can show for itself is progress of the sort I have sketched, then its showing is very 
poor indeed. For where is the body of established and accepted theory we expect from science? 
Where are the universal and lawful generalizations that we expect a genuine science to produce? 
It is obviously very difficult to find any such established theory (or theories) in psychology; and 
equally dificult to find many, or any, invulnerable lawlike generalizations. Psychology has 
manifestly failed to produce an edifice of scientific knowledge. Therefore, it remains true to say 
that it has failed to achieve the progress characteristic of science. 

Is this so? It is evident that this argument presupposes that all progress in science is of one 
sort. Progress only occurs if the hard bricks collected are put together into a building, or edifice, 
of knowledge. Is this presupposition correct? In my view, and I think in the view of most 
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historians and philosophers of science, the answer is: emphatically no; the 
accumulation-cum-building picture of scientific progress is quite inadequate. It is easy to expose 
its inadequacy. All we need do is to remind ourselves of some examples which are quite 
manifestly examples of scientific progress, but which the accumulation-cum-building picture 
cannot accommodate. 

In 1839 a young naturalist published his notes of a journey round the world - a rambling 
series of unrelated, particular jottings and queries about what he had observed — animals, terrain, 
climate, and so on and so forth. At the time this report merely added to the naturalists’ stockpile 
of particular facts. At that time, and on the building view, this report could not be said to 
constitute a step in the progress of science. Yet we would all agree today that this report by 
Charles Darwin (1839), in The Voyage of the Beagle, represented a critical step in the advance of 
contemporary biology (see also Darwin, 1859). Consider, next, the history of our understanding 
of mental disorder. When Pinel and others began the humane treatment of the insane, they also 
set in train an attempt by the medical world to bring some sort of order into the phenomena 
presented by the behaviour disorders. But to do this is to reject the pre-existing belief that to be 
mad was to be possessed, and to adopt, instead, the assumption that the phenomena were 
natural ones, and therefore ones falling within the scope of scientific inquiry. This was a great, 
indeed the very first, step forward in the scientific study of this field. But it is an example of 
scientific progress that the building model cannot cover. What is more, during the 19th and early 
20th centuries, the medical world succeeded to a large extent in ordering the phenomena in this 
field by means of the post-Kraepelinian classification with which we are all familiar. But, on the 
building view, it is doubtful whether we can say that this classificatory achievement represents 
any scientific progress at all (Kraepelin, 1905-6). Likewise for any other classificatory scheme - 
for example, in zoology itself. And all this is just absurd. As a last example, recall the state 
of chemistry between, say, 1700 and 1770. At that time, we would have been right to describe 
chemistry as a jungle of confusion with no discernible thread of progress in it, a splendid 
example, in fact, of what Koch should describe as an imitation science. Yet, with our present 
hindsight, we can see very well, and in detail, how this work prepared the way in an 
indispensable fashion for the break-through by Lavoisier (1789), and the founding of modern 
chemistry. 

I hope these few examples are sufficient to show that the sceptics’ view of scientific 
progress will not do. Even if we allow that it fits certain cases of progress, it cannot be 
generalized to cover all of them. The obvious reason for this is that science is a very large and 
rambling mansion. What goes on in one room is often very different from what goes on in 
another; progress in one room, therefore, may take a verv different form from progress in 
another. 

But this is only part of the story. I suspect that Koch, and other sceptics, have also been 
misled by another and still deeper presupposition that they have unwittingly adopted. This 
presupposition is not an easy one to state shortly. I will state it with the aid of Kuhn’s 
distinction (1962) between paradigmatic and preparadigmatic science. However, what I propose 
to say here is not logically tied to Kuhn’s distinction, and can be restated without using it. I 
resort to Kubn’s distinction and language simply because it is well known and convenient for my 
restricted purposes. 

The sceptics are presupposing that all progress in science consists either in the achievement of 
a paradigm; or in the results of post-paradigmatic inquiry. That is to say, progress in science is 
to be found either in the achievement of, for example, Newton in the Principia or in the Opticks; 
or in the subsequent development of these paradigms. In these achievements and their 
developments we have edifices of knowledge. As the sceptics are controlled by this 
presupposition, they will naturally take it for granted that progress in science is all of one type, 
namely that which puts bricks together to make an edifice of knowledge, or which goes on 
adding bricks to extend the edifice. Because psychology has not done this, it has failed to 
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progress and this shows us that it is a bogus or imitation science. I hope I am right in saying 
that, at this point, we need hardly do more than dig out this presupposition and expose it to the 
light, in order for us to realize that it simply will not do. For it obviously carries with it the 
consequence that there is no such thing as progress during the preparadigmatic stage of a 
science. It means that we cannot say, for example, that chemistry progressed between 1700 and 
1770. This is just absurd. So the sceptics’ general view of progress in science is very misleading 
and quite inadequate. 


IV 


I come now to the second part of the discussion. Koch’s criticism of psychology as an imitation 
science only becomes interesting when he goes on to argue (in effect) that psychology never will 
achieve a paradigm, and hence never will develop in the way that the natural sciences have 
done. He presents this case in the second argument I mentioned at the beginning, namely that 
the subject matter of psychology is such that the concepts of experiment and variable, and the 
like, and scientific method in general are not applicable to this subject matter. 

Now, in the paper from which I have quoted, Koch does not tell us at all clearly just what the 
precise reasons are, on which he relies to support this argument of inapplicability. I confess I 
find the same lack of clarity when I consult some of his other writings on the subject (e.g. Koch, 
1959, 1964). This is a pity, because there are reasons available, which lend considerable weight 
to his argument. Let me look briefly at what is, I think, the strongest of these supporting 
reasons, and one which, I think, Koch himself would be ready to endorse. 

If we are to achieve a paradigm in any field, then, presumably, we will have to arrive at some 
generalizations about the field that we will agree are true. But (roughly put) a generalization 
connects some property A with some other property B in a certain domain. Therefore, to be able 
to state a true generalization connecting A and B, we must be able to ignore all other properties 
that may also be exhibited by the particulars of the domain involved. That is to say, we must be 
able to abstract A and B from the other properties of the domain, and still go on to assert a true 
generalization connecting them. If abstraction is not possible in a field, then it follows that 
scientific method - with its concepts of experiment and variable, etc. — are not applicable to this 
field, and Koch’s second argument is true of it. 

Well, is the abstraction of properties possible in psychology or not? There is a logical, or a 
priori, case for non-abstractability in psychology. It would be widely accepted that ‘the heart of 
the subject’ (in Koch’s phrase) is concerned with functions such as perception, attention, 
memory, learning, intelligence, motivation and so forth. Now, for me to assert ‘Johnny is still 
learning the rules of castling in chess’ is also to assert that, when playing chess, Johnny will 
sometimes remember how to move and sometimes not. For me to assert that ‘Smith saw X’ is 
(in certain key contexts) also to assert that ‘Smith attended to something or other at the time’. 
In short, the concepts involved in our mental functions are logically connected, and (it could be 
maintained) connected intimately and on an extensive scale. From this it follows that abstraction 
is not possible. It will not be possible for me to produce a true generalization about one of them 
without also necessarily having to produce true generalizations about an indefinite number of 
other functions at the same time. Therefore, experiment and scientific method in general are not 
applicable to mental functions. 

I think this a priori argument for non-abstractability is a weak one. (a) Even if these concepts 
are logically connected in an intimate and extensive way, it does not follow from this fact that 
no abstraction is possible at all. What does follow is that abstraction is a difficult exercise. I 
think psychologists accept this conclusion. They recognize that theirs is a difficult subject. (b) 
The critical question is this: will the abstraction (which it is possible for a psychologist to carry 
out) necessarily be insufficient, or not enough, to permit psychology to make preparadigmatic 
progress up to the stage where a paradigm can be achieved? The a priori argument fails to show 
that this is the case. In other words, it fails to show that the difficulties of abstraction are such 
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that psychologists will necessarily not be able to make the preparadigmatic progress sufficient for 
them to achieve a paradigmatic transformation of their subject. (c) In any case, it could be said, 
the a priori argument from non-abstractability holds for the concepts of ordinary discourse. 
What it shows, therefore, is that these concepts do not help particularly towards the scientific 
understanding of how men and animals function. What the a priori argument draws attention to 
is the need to replace ordinary concepts by other, technical ones which do not produce these 
difficulties. 

So much, then, for the a priori argument. There is also, however, an a posteriori, or empirical, 
argument for non-abstractability in psychology. This appeals to the history of the subject, and 
stresses the notorious fact that psychologists have not unearthed many satisfactory or 
invulnerable law-like generalizations. The argument can be boiled down into a crude syllogism. 

If psychological properties are (generally) abstractable, then many satisfactory generalizations 
will have been found by psychologists. 

Many such generalizations have not been found by psychologists. 

Therefore, psychological properties are not (generally) abstractable. 

Obviously, the weight of this argument depends on the strength of the major premise. Have 
we good reason to believe it? Have we, in other words, good reason to believe that abstractability 
is a sufficient condition for the discovery of satisfactory generalizations in psychology? Clearly 
not. Abstractability is not sufficient by itself; other conditions are manifestly also required before 
psychologists can arrive at satisfactory generalizations. The two most obvious ones are: (1) an 
adequate background of knowledge, and therefore of concepts, in the field and in other related 
ones; (2) techniques of investigation strong enough to do the job. It is plausible to argue that it is 
the absence of these two conditions during the history of modern psychology that is largely 
responsible for the history and state of the subject, not the (alleged) non-abstractability of 
psychological properties. 

If I had the space, I would like to make all this clearer by examining some examples from the 
history of science. It will have to suffice for me to refer to one I have mentioned before, namely 
chemistry. Suppose we take ourselves back to 1700 (say), and look at the state of the subject at 
that time. We can then construct, I think, a splendid argument to the effect that chemistry is an 
imitation science, in which experiment is impossible, and which will never achieve a paradigm. 
With the hindsight of the 20th century, we can appreciate clearly that, and why, this sceptical 
argument is quite wrong of chemistry in 1700. Such an appreciation is sobering. For it makes us 
realize that the sceptics’ case today against psychology, however beguiling, may be quite wrong 
also. [For an introduction to the state of chemistry at the time, see Toulmin & Goodfield (1962). 
This reference is sufficient to suggest how a good sceptical case against chemistry could have 
been constructed at that time. ] 

Indeed, when one takes an overall view — through a historian’s spectacles — of the history of 
the sciences, the conditions that favour and hinder progress, and so on, one comes away (I 
think) with a relatively optimistic picture of the prospects of psychology. The development and 
progress that the subject has achieved to date —- such as that shown, for example, ty the 
psychology of learning I considered earlier — is quite characteristic of the preparadigmatic stage of 
science. There is nothing in the history of science in general, and in that of psychology in 
particular, to show, or even suggest, that psychology will not reach the paradigmatic stage in time. 


V 


The next question is an obvious one. If the sceptics’ case is mistaken and it is possible for 
psychology to achieve a paradigm, how are we to achieve it and what form (or forms) will ıt 
take? It should be evident from what I have said already that we cannot infer from our present 
knowledge and from the present state of psychology what form a paradigm will take in it. It 
should also be evident that there are no methodological rules available for achieving a paradigm. 
If psychologist X urges us to adopt a certain method, he cannot prove that this method will get us 
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there. At most he can offer us a bet or hunch ~ with supporting reasons — that a certain research 
strategy will pay us the best dividends. When a Skinnerian, for example, urges his experimental 
analysis of behaviour upon us, some psychologists may object on the ground that this strategy is 
very superficial and limited, and that we obviously need to know about the internal conditions and 
cognitive functioning of the organism. But the Skinnerian may be right. Let us remember, after 
all, that the first and great paradigm in biology was achieved by Charles Darwin without a 
knowledge of Mendelian regularities or the mechanisms of heredity — and solely on the basis of 
the macrophenomena of species’ differences and survivals and the like. It is logically possible 
that a 20th century Darwin in psychology will emerge to achieve a paradigmatic synthesis of the 
behavioural discoveries of Skinner and others. When, in contrast, a psychologist urges us to 
rely on what he may claim has been underused in recent decades, namely the power of the 
human organism to report what he is thinking, feeling, etc., others may object to this policy. 
They may object on the grounds that human introspection and reports have a very limited 
reliability and use. Still, this psychologist may be right, in spite of these objections. 

My own hunch is very different, but quite orthodox and quite unexciting. I think that a 
paradigm will be achieved when we come to understand how the nervous system works. I think 
that perhaps the first paradigm may be achieved when we come to understand how it subserves 
some type of animal learning - say, Pavlovian conditioning or some simple visual discrimination 
learning (for example, of brightness). I would back this hunch by an argument from analogy. In 
the last 250 years of science, we have ordered the varied panorama of nature in a way that 
represents a remarkable achievement by the human mind. We have done this by postulating and 
finding minute entities in relations, which are responsible for the panorama. In the course of the 
last 120 years or so, physiologists have also succeeded, in this same way, in ordering many 
aspects of the functioning of organisms. A good, and very promising, beginning has been made 
on the nervous system itself. Hence there are good reasons to believe that the corpuscularian 
tradition will also be found to apply to the nervous systems of organisms, and the panorama of 
activity that these systems subserve. If this really turns out to be the case, then the 
psychological phenomena that organisms present will also be shown to have an order that is the 
manifestation of the operation of minute entities in relations. And the route to the first, and 
immediately succeeding, paradigms in psychology will take the form indicated by the 
corpuscularian tradition of science. 

I am aware that, in recent years, some psychologists have raised sceptical doubts about this 
paradigmatic route for psychology. Thus, it has been argued that it is utterly impractical, because 
we will not be able to obtain sufficient control over the internal states, or conditions, of an 
organism to find out what the minute entities are, and how they subserve the activity of the 
organism. Let me, as an outsider, respectfully advise the sceptics not to take this stand. The 
history of the last 30 or 40 years points very firmly in the opposite direction. Techniques of 
electron microscopy, of single unit analysis, and in the biochemical study of impulse 
transmission all suggest that we have at last got down to the level of minuteness required. These 
techniques cast very serious doubts on the thesis that the internal states of organisms are beyond 
the bounds of practical inquiry. However, it has also been argued that, if the study of internal 
states does turn out to be practical and successful, the result will be the death of psychology and 
its replacement by neurophysiology. I think this argument is quite fallacious, and its fallacies 
have been sufficiently exposed in recent years. For one thing, we cannot logically even discover 
how the minute entities of the nervous system are connected with mental functions without the 
use of psychological concepts and methods of inquiry (Fodor, 1968). 


VI 


From all this it is evident that I think the case of the sceptics is a bad one. It is simply not true 
that psychology is an ‘imitation’ science; this description is quite misplaced. What the critics are 
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really trying to tell us, I think, is that the traditional picture of the nature of psychology just will 
not do. In other words it is quite wrong to think of contemporary psychology as being like 
physics, or any other post-paradigmatic natural science. Here the sceptics are right; this 
traditional picture of psychology has indeed broken down. But it is quite unwarranted to 
conclude from this that psychology itself has broken down, and that psychologists are doomed 
for ever to walk the dark and dreary nights of preparadigmatic science. This conclusion simply 
does not follow. 

Accordingly, I suggest that we take a different view of the subject. I suggest we look upon it 
as a Scientific inquiry, which is still in its exploratory stages. Because of this, it uses theory and 
experiment as exploratory tools - charting the phenomena of animal and human activity. The 
progress it achieves, therefore, is characteristically preparadigmatic. On the other hand, there is 
some reason to believe that we are en route to the achievement of a paradigm in the subject - 
something that we may achieve in this coming century when, for example, we do begin to 
understand how the brain works. In the meantime, allow me, as a philosopher, to say to 
psychologists: Be of good cheer! do not allow yourselves to be overcome by depression and 
hypochondria about your own work and subject. Of course, if you do find yourself running out 
of ideas, and, in consequence, that you can no longer stand the heat of psychological inquiry, all 
you have to do is to remember President Truman’s advice: ‘Get out of the kitchen’. But, please, 
do not then react to the whole subject in a way that gives the impression you are a psychologist 
manqué, rationalizing your disillusion by, depreciating the subject and the work of fellow 
psychologists. If, on the other hand, you decide to remain inside the psychological kitchen, then 


obviously what the subject (like any other science) will require from you is not philosophical 
scepticism about it, but persistence, patience and originality inside it. Act, therefore, so as to 
foster these virtues. In particular, try to maintain your own creative impulses. Then you may 
succeed collectively in achieving the progress in psychology that will make otiose all papers like 


this one, which I have just presented to you. 
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The effect of catch-trials on speed and accuracy among introverts and 
extraverts in a simple RT task 


John Brebner and Rosemary Flavel 


Three predictions from the model of extraversion put forward by Brebner & Cooper (1974) were tested in a 
simple RT task. In line with the model, extraverts were found to make more commissive errors and to be 
more affected by an increase in the catch-trial rate. The third prediction — that extraverts would tend to 
produce longer runs of decreasing RTs than of increasing RTs in the condition with 10 per cent catch-trials, 
was not borne out. This effect was, however, seen in the condition with 40 per cent catch-trials although it 
was not statistically significant, P= 0-06. 


In a recent article, Brebner & Cooper (1974) proposed a simple model of introversion— 
extraversion. This model made the assumption that the effects of stimulation impinging upon the 
individual, and the demands for active responses from him, were independent of each other and 
that either could have central excitatory or inhibitory effects. From this standpoint, it was 
suggested that both introverts and extraverts are characterized by an imbalance between the 
effects of stimulation on the one hand and response organization on the other. In the case of the 
introvert, stimulation was hypothesized to create an excitatory state (S-excitation) but response 
preparation to build up an inhibitory state (R-inhibition). The extravert was characterized as a 
person with the opposite tendency, that is, to generate excitation from the organization and 
emission of responses (R-excitation) but, in the absence of response demands, the effects of 
stimulation rapidly become inhibitory (S-inhibition). 

Those types of ‘response’ which occur as perceptual or cognitive integrations, and the 
feedback resulting from one’s own responses, were both specifically classified as stimuli within 
the model. 

Because the introvert tends to generate excitation from stimulation but inhibition from active 
responding, Brebner & Cooper described the introvert as ‘geared to inspect’, and his extravert 
counterpart as ‘geared to respond’ because of his opposite tendency to generate R-excitation but 
S-inhibition. Thus, the extravert might be more accurately described as ‘response hungry’ rather 
than ‘stimulus hungry’ a term in current use, even though with sufficiently varied and intense 
stimulation it is possible to maintain S-excitation even in the extreme extravert. 

The first evidence supporting this specific model (Brebner & Cooper, 1974) showed that the 
reaction times of extraverted subjects were more affected by S-inhibition than those of 
introverted subjects. This result was obtained under conditions which precluded explanation in 
terms of characteristic arousal levels or feedback-mediated S-inhibition. Moreover, the strength 
of the effect was shown to vary with time on task, which was also predictable from the model. 
The experiment reported below, tested the complementary hypothesis derived from the same 
model that R-excitation would be stronger for extraverted subjects than for introverts. 

The predictions tested were that, in a simple RT task involving ‘catch-trials’ : 

(1) Extraverted subjects would make more commissive errors than introverted subjects. Since 
‘response hungry’ extraverts are postulated to generate R-excitation from the organization and 
emission of response, and to depend on R-excitation to maintain their overall state of arousal, 
they should show a stronger tendency to respond in the absence of the appropriate signal than 
introverts who are postulated to generate R-inhibition from the organization and emission of 
responses. 

(2) The speed and accuracy of extraverted subjects would be relatively more affected than 
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that of introverts as the proportion of catch-trials increased. This second result is predicted on 
the grounds that the degree of R-excitation is related to the response rate which is affected by 
the proportion of catch trials. 

A further relevant statement by Brebner & Cooper (1974) was that ‘. . . where R-eacitatory 
potential is high but S-excitatory potential is low the extraverts’ performance would be relatively 
better until S-inhibition increased as a function of responding and acted to decrease the overall 
excitatory potential in the extraverted subjects’. If in the present experimental situation 
responding is sufficiently frequent to build up S-inhibition through feedback stimulation, then one 
might predict (3) an effect akin to repeated ‘involuntary rest pauses’ either in occasional failures 
to respond or increased response latencies. These temporary failures differ from the general 
lowering of responsiveness associated with a continuing high S-inhibition level which results 
from the lack of opportunity to respond. That effect is expected to reduce speed of performance 
among extraverts as the catch-trial rate rises. The modulating effect of the ‘involuntary rest 
pause’ phenomenon should, on the other hand, occur during periods of repeated responding and 
is most likely to be found where the catch-trial rate is lowest. The parallel between this effect 
and Eysenck’s (1955) original ‘reactive inhibition’ will be evident. 


Method 
Apparatus 


The stimulus for response was the digit ‘1’ presented on a Nixie tube located behind a small glass screen set 
in a timing and display control unit with a matt black painted surface. This unit was linked both to a 
paper-tape reader, through which programmed tapes controlling the stimulus display were fed, and also to a 
teletype which encoded the relevant information regarding stimuli and reaction times on to paper tape. The 
punch was also connected to the response Morse key operated by the subjects. Inter-trial intervals were 
regular, of 2-3 sec duration, and signals were displayed for 200 msec. Each trial was preceded by a warning 
light, which became visible in the top left-hand corner of the display screen before the signal was due to 
occur; the warning signal was of the same 200 msec duration as the signals. 


Subjects 


Sixteen undergraduate subjects from the University of Adelaide, ranging in age from 17 to 24 years (mean 
age = 19-4 years), took part in the experiment. Selection was madz on the basis of E-scores obtained on 
Form A of the EPI. Those classified as introverts scored in the range 2-5, while the extraverts’ scores ranged 
from 19 to 23. All subjects had normal vision. 


Procedure 


The subject was seated approximately 1 metre from the display screen with his index finger resting lightly on 
the response key. To ensure that the subject understood the nature of the task he was given a practice run 
of five signals to which he responded. Following this, the subject was given a block of 100 trials without 
catch-trials and was required to respond as quickly as possible to every signal. The purpose cf this task was 
to check for differences in reaction times between the introverts and extraverts not due to the experimental 
treatments of this study. The experimental session was then begun. 

In the experimental condition subjects were required to respond as quickly as possible to the digit ‘1° but 
not to respond on any catch-trial. Catch-trials were randomly distributed throughout the series and consisted 
of the warning light followed by a blank screen. Three blocks of 200 trials were given, representing three 
different stimulus conditions. Under condition A, 10 per cent of the total were catch trials, in condition B 
there were 40 per cent catch-trials, and in condition C 70 per cent. The order of presentation of the three 
conditions was varied randomly to minimize order effects. 


Results 

Commissive errors 

Table 1 shows the number of times responses were made on catch-trials by subjects under the 
three conditions of the experiment. It is clear from the table that extravert subjects tended to 
make more of these commissive errors, and that the frequency of such errors is inversely related 
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Table 1. Errors committed by responding on catch trials 








Condition 
C (70% catch- 
Subject E-score A (10%) B (40%) trials) Total 
l 21 7 8 5 20 
2 23 5 4 2 11 
3 22 4 0 0 4 
4 21 9 4 3 16 
5 20 14 9 11 34 
6 19 10 6 5 21 
7 22 4 5 3 12 
8 19 2 0 0 2 
55 36 29 120 
9 5 0 1 1 2 
10 4 2 0 0. 2 
11 2 2 1 1 4 
12 4 2 6 0 8 ee 
13 4 2 2 0 4 è 
14 4 2 0 0 aa 
15 5 2 1 0 3 | 
16 4 4 1 l 6 o’ 
16 12 3 31 


to the proportion of catch-trials. y? confirms the first prediction that extraverts do make 
significantly more errors (x? = 52-4, d.f. = 1, P< 0-01). The difference between the three 
conditions is also significant (x° = 15-3, d.f. =2, P< 0-01). 

Given that the data in Table 1 are evidently unsuitable for parametric statistics, and that the 
power of most published distribution-free analyses of variance is extremely low (Wilson, 1956), 
the interaction between personality types and the three conditions proposed in the second 
prediction was tested by performing a Mann-Whitney U test on the difference in errors between 
the two extreme conditions A and C. This gave U = 14-5 which is significant (P< 0-05) for the 
appropriate one-tailed test. While this'result is in line with the prediction made, nevertheless, 
since introverts made so few errors at all, it is possible it merely reflects the fact that the closer 
to the ‘zero error’ bound the introvert group is, the smaller must be the change in errors across 
the conditions of the experiment for these subjects. Further research under conditions in which 
the commissive error rate of introverts was higher than in this study would be useful provided 
the line between the different S-inhibition effects above was retained. 

A second source of commissive errors lies in the number of anticipatory responses made by 
the subject. By this is meant simply that the subject would have responded whether a signal 
appeared or not, but that where a signal did occur, a response is not an error unless clearly 
programmed before the occurrence of the signal. The limit of 80 msec was accepted as the 
criterion for an RT being defined as anticipatory. All longer RTs were accepted as genuine 
responses to the signal. 

This category of anticipatory commissive errors proved illuminating. Table 2 shows the 
number of such errors in each condition for both groups. The fact that none of the introverts 
made such errors is striking and makes any statistical test between the two groups unnecessary. 
A significant difference across conditions in the predicted direction is obtained from the 
extraverts, 77 = 21:1, d.f. =2, P<0-01. 
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Table 2. Number of anticipatory commissive errors 





Condition 


A (10%) B (40%) c (70%) Total 





Extraverts 42 44 11 97 
Introverts 0 0 0 0 





These results agree with the hypothesis that R-excitation is greater in the extravert. However, 
such differences between the groups in commissive errors could also possibly derive from the 
level of speed which subjects were trying to attain, that is, from extraverts (but not introverts) 
trading accuracy against speed. This possibility can be examined by comparing the reaction 
times of the two groups. 


Table 3. Mean RTs (msec) and standard deviations for the two personality groups in the three 
conditions 


Condition 

A (10%) B (40%) C (70%) 
Extraverts 289 94.1 328 83-9 - 344 69-5 
Introverts 313 74-9 321 60-6 337 60:1 


Reaction times* 


On the initial task which did not include any catch-trials the two groups showed no significant 
difference in mean reaction time (t = 0-43, d.f. = 14, P< 0-05). Any differences under the 
experimental conditions appear, therefore, to be attributable to the introduction of catch-trials. 
Table 3 shows the mean reaction times obtained under the three conditions of the experiment. 

From Table 3 it can be seen that RT increases as the catch-trial rate rises, supporting 
prediction (2) above. Anova shows the difference across conditions to be significant (F = 110-6, 
P<0-01). The general speed of reaction, as in the pre-test is not significantly different for the 
two groups (F= 0-03, P> 0-05), which argues against any general tendency to trade accuracy for 
speed among the extraverts. But the two groups do differ in the effect which raising the 
catch-trial rate has on their RT. The interaction conditions personality type is significant 
(F=8-5, P<0-01), and again supports prediction (2) that extraverts will be more affected than 
introverts as the catch-trial rate increases. 


The ‘involuntary rest pause’ phenomenon 
p 


The temporary performance decrement predicted for extraverts with low catch-trial rates was 
tested in the following way. The decrement should show itself either in missing responses or in 
slow RTs. There were only three failures to respond in the whole experiment, and although the 
extraverts accounted for twice as many of these as the introverts, such a result is not quite 100 
per cent conclusive. 

Rather better evidence is provided by changes in the speed of responding. Since it is a 
short-lived decrement in performance which was predicted, the mean length of runs of RTs 
' * Due to a mechanical failure, the RT data from one extravert subject (6) was unreliable. His results 


were omitted from RT analysis. To equate the numbers in the two groups for Anova one introvert subject 
(14), chosen randomly, was also dropped from the RT analysis. 
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which decreased in latency was compared with that for RTs which increased in latency. If 
extraverts generate R-excitation from organizing responses, but S-inhibition from the feedback 
effects of responding, their performance should tend to increase in speed as R-excitation 
increased until the rise in S-inhibition was sufficiently strong to oppose it at which point RT 
should lengthen. Responses providing more feedback, as in more effortful or more complicated 
responses would be expected to build up S-inhibition at faster rates than easier responses. Since, 
in this experiment the same response is repeated, the build up of S-inhibition from feedback is 
assumed to proceed at a constant rate during responding. 

In the present context what is expected is that for extraverts only, where the catch-trial rate is 
low, the mean length of runs of RTs in which response latency decreased should be greater than 
the mean length of runs of increasing RTs. Table 4 shows the mean lengths of increasing and 
decreasing runs of RTs. 

Where the sign is positive, the mean length of runs in which RT became progressively faster 
exceeded the mean length of runs in which RT became progressively slower. 


Table 4. Mean run lengths 


Condition A (10%) Condition B (40%) Condition C (70%) 

RT RT RT RT RT RT 

incr. Diff. decr. incr. Diff. decr. incr. Diff. decr. 
Extraverts 
(1) 1-60 (=) 1-60 1-34 (+) 1-50 1-55 (-) 1-45 
(2) 1-46 (+) 1-66 1:30 (+) 1-54 1-45 (-) 1-33 
(3) 1-44 (-) 1-34 1:34 (+) | 1-79 1-55 (-) 1-50 
(4) 1-48 (+) 1-58 1-47 (+) 1-86 1-61 (+) 1-63 
(5) 1:7] (+) 1-79 1-40 (+) 1-52 1-65 (-) 1-43 
(6) 1-48 (=) 1-48 1:5] (+) 1:56 1-67 (—) 1-42 
(7) 1-58 (-) 1:34 1-68 (-) 1-45 1-53 (+) 1-63 
Introverts 
(1) 1-58 (-) 1-52 1-42 (+) 1-56 1-58 (-) 1-50 
(2) 1-50 (+ 1-66 1-5] (-) 1-50 1-35 (+) 1-57 

- (3) 1-40 (-) 1-34 1-42 (+) 1-55 1-67 (+) 1:76 

(4) 1-43 (-) 1-41 1-59 (+) 1-81 1-37 (+) 1-79 
(5) 1-39 (+) 1-43 1-47 (+) 1-66 1-62 (-) 1-29 
(6) 1-61 (-) 1-59 1-58 (-) 1-54 1-30 (+) 1-62 
(1) 1-58 (+) 1-61 1-62 (-) 1-42 1-38 (+) 1-52 


By inspection of Table 4 the prediction (3) above is not supported in condition A. However, in 
condition B, the results are in the predicted direction although, on a binomial test, they fail to 
reach statistical significance; on the appropriate one-tailed test P= 0-06. 

Why the predicted effect should be at its strongest in condition B becomes a matter for 
speculation. Possibly since mean RT was at its fastest in condition A, ‘regression toward the 
mean’ occurs more often in that condition. This would reduce the mean length of those runs on 
which RT continued to decrease. 

A slightly more complex alternative interpretation is that although R-excitation was highest for 
extreme extraverts in condition A, S-inhibition deriving from feedback also exerts its strongest 
influence in that condition. It is in condition A that responses are most frequent. S-inhibition 
from feedback builds up over repeated responding but tends to dissipate in the period between 
responses. S-inhibition from feedback, thus opposes R-excitation in condition A where its effect 
is stronger than in other conditions, though for most of the time it is weak in relation to the 
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strength of R-excitation in the condition. The overall speed of responding, therefore remains 
highest in condition A, but there is an attenuating effect on runs of decreasing RT which is due 
to S-inhibition from feedback. Following this line through, in condition B, R-excitation is lower, 
but the reduction in S-inhibition from feedback allows relatively longer runs of decreasing RT. 
In condition C, the high degree of S-inhibition for extraverts resulting from low response 
demands tends, if anything, to produce runs of increasing RT for these subjects. 

The ad hoc nature of this interpretation guarantees that it matches the findings, but further 
research seems to be indicated to establish the different action of the two types of S-inhibition 
postulated. 

The theory outlined above is an attempt to amalgamate some of the general features of earlier 


theories which have been substantiated experimentally. For example, there is by now ample 
evidence to show that, in some circumstances, extraverts generate response-mediated inhibition 
faster than introverts do, as Eysenck’s (1955) reactive inhibition theory argued. There is also 
support for the later suggestion that extraverts and introverts differ in their arousability. 
Unfortunately, however, these explanations seem to be regarded at best as independent of one 
another, and at worst as alternatives. In same studies it seems that holding too narrowly to one 
or other of these explanations produces predictions which are unrealistic when the actual 
experimental task is considered. Thus, the finding by Nocita (1973) that extraverts showed 
repetitive, perseverative behaviour in an insoluble choice task, ran counter to the hypothesis in 
that study that responding creates an inhibitory state in extraverts who should, therefore, tend 
not to repeat responses. 

Also, the prediction by Buckalew (1973) that extraverts would have longer simple RTs since 
introverts have lower sensory thresholds, was not supported, and it was the extraverts who 
produced shorter RTs. The present theory predicts that ‘response hungry’ extraverts will be 
faster on their initial response and respond more frequently, encompassing both these 
experimental outcomes. An experiment by Brebner & Cooper (unpublished) has confirmed 
precisely these predictions within an ‘inspect or respond’ task using coloured slides. 

Other studies of the behaviour of the two personality types under sensory deprivation and 
Vigilance conditions seem to fit the view that extraverts are prone to R-excitation but 
S-inhibition, so that they tend to be impulsively active and depend upon responding to maintain 
their alertness, and, conversely are less aroused by stimulation in the absence of activity. 
Miyashiro & Russell’s (1974) finding that extraverts did not seek longer durations of a sound 
stimulus during sensory deprivation, but did seek the stimulus more frequently, points to the 
extraverts’ need to respond actively. Morgenstern, Hodgson & Law’s (1974) finding that 
extraverts’ performance at a learning task improved under distraction while that of introverts 
worsened, is accompanied by the observation that the exploratory movements of the two groups, 
which were made to improve the artificially distorted character of the stimulus, differed from one 
another. The movements of the introverts were few and slow in comparison to the large 
extravagant movements of the extraverts. Here again the need for active responding emerges as 
does the tendency among extraverts to vary their movements more than introverts in an 
experiment by Hill (1975). 

Finally, a relationship between impulsivity and inability to maintain attention in a vigilance task 
has been reported (Thackray, Jones & Touchstone, 1973), using items from Form A of the EPI 
to measure impulsiveness. 

No full review would be appropriate here, but the studies outlined fit the same pattern of 
extraverts seeking activity to maintain arousal rather than more passive forms of self-stimulation 
such as visual inspection. 

This does not mean that extraverts do not seek stimulation. With sufficiently intense and 
varied stimulation it is possible to maintain arousal even in the extreme extravert, but it would 
appear from the experimental evidence that active responding is more effective. 
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Conclusion 


The findings of this study support the main hypothesis that R-excitation is stronger in 
extraverted individuals. From these results together with the data of the previous study (Brebner 
& Cooper, 1974) a coherent picture of the extravert is emerging which shows him to be prone to 
both types of S-inhibition and dependent upon R-excitation to maintain his responsiveness. The 
distinction between S-inhibition generated by feedback from responding and S-inhibition which 
builds up in the extravert when response demands are low, was drawn in the original statement 
of the model. 


The data of this experiment warrant further research into these different types of S-inhibition. 
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Short-term serial tactual recall: Effects of grouping on tactually probed 
recall of Braille letters and nonsense shapes by blind children 


Susanna Millar 


The study tested the hypothesis that grouping has adverse effects on the recall of tactual shapes but 
facilitates the recall of tactual letters on the assumption that this depends on different processes. A further 
question was the relation of grouping to letter recall span (set-size). 

Tactually probed recall of tactually presented serial nonsense and letter shapes by blind children was 
tested under grouped and ungrouped conditions. 

Results showed a highly significant interaction between list type and grouping, and a (smaller) higher order 
interaction between set size list type and grouping, in the predicted directions. Grouping had adverse effects 
on nonsense shape recall. Letter recall was better and was facilitated by grouping, except by subjects with 
poor letter recall spans who were also slow at letter naming. Mental and chronological age were associated 
with higher scores, but unlike set size, did not relate differentially to letter grouping. It was argued that the 
form of coding is a factor in determining the nature of processing and in recall efficiency. 


Recent studies have shown that blind subjects without access to visual codes can match tactual 
features of verbal inputs (Millar, 1977). This is analogous to findings on visual memory (e.g. 
Posner, Boies, Eichelman & Taylor, 1969) and raises questions about the nature of such 
memory, and particularly whether memory for tactual features depends on the same processes 
as visual and verbal memory. Sullivan & Turvey (1972) suggested that tactual memory, unlike 
memory for verbal (Dillion & Reid, 1969) and visual (Posner, 1967; Millar, 1972) inputs are 
affected by length of delay rather than by attentional demands during delays. However, effects 
of distractors on tactual memory have been found by Gilson & Baddeley (1969), Sullivan & 
Turvey (1974) and Millar (1974).This might mean that the processes governing tactual memory 
are similar to those underlying memory for visual and verbal inputs and decrements with delay 
were due to an additional peripheral tactual ‘afterglow’. An alternative possibility is that the 
nature of processing differs with the type of code that is elicited by the task and materials or 
chosen by subjects. Thus disturbance by attentional demands during delays found in studies 
using the Brown—Peterson (Brown, 1958; Peterson & Peterson, 1959) paradigm could have been 
due to visual (Sullivan & Turvey, 1974) or verbal (Millar, 1974) recoding. Memory for tactually 
coded features, on the other hand, may not be subject to control processes in the sense used 
by Atkinson & Shiffrin (1968), Results by Millar (1975 a, b) that recall decrements on tactually 
similar serial items were shown mainly by subjects testable on smaller set sizes, while subjects 
testable on larger set sizes showed decrements mainly when serial items were phonologically 
similar were more consistent with the latter explanation. They suggested that processes 
underlying tactual memory for serial items differed from verbal memory. The present study was 
designed to test the relation between tactual and verbal memory further by examining the effects 
of grouping serial items on tactual recall of inputs either easy or difficult to code verbally. 
There is a good deal of evidence that grouping items at presentation facilitates the recall of 
verbal material. This has been explained in terms of ‘control’ (Atkinson & Shiffrin, 1968) 
processes which can maintain items in memory. Grouping imposes pauses during which 
attentional capacity may be used to maintain inputs. It was argued that if tactual memory is 
subject to the same processes as verbal memory, grouping items should have similar effects. If, 
on the other hand, it is subject to degrading rather than to attentional controls, the pauses 
imposed by grouping should actually produce detrimental rather than facilitating effects. A 
reduced effect of grouping might be expected for nonsense shapes which are less easily coded 
verbally than for letters. But if nonsense shapes elicit modality-specific coding, support for the 
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hypothesis that this depends on different processes would require an interaction showing that 
grouping improved letter recall but affected nonsense shape recall adversely. 

A further question is how the nature of coding may be related to recall spans, or the size of 
set on which children achieve above chance level scores before failing. Previous evidence 
(Millar, 1975, b) suggested that subjects testable only on small set sizes may code even items 
they can name in modality-specific rather than in verbal form. If the mode of encoding relates to 
recall efficiency, a relation of set size to differential grouping effects should be found. Only 
children who were blind from birth or early infancy were tested to eliminate possible 
confounding effects from visual recoding. The question how tactual inputs are coded is, in any 
case, of particular interest for these subjects. 


Method 


The study tested the hypothesis that grouping increases scores on tactually presented letters but decreases 
scores on tactually presented nonsense shapes. It was suggested further that this might relate to set size. 

A set size (2 or 3, 4 or 5, 6 or more item lists) xlists type (Braille letters, nonsense shapes) X grouping 
(ungrouped, grouped conditions) design with repeated measures on the last two factors was used in tactually 
probed recall of tactually presented serial lists with blind children as follows. 


Subjects 


Subjects were 24 profoundly blind children (minimal light but no shape perception or total blindness from 
birth or less than 20 months of age) without known brain damage, who could reliably name test letters and 
reliably discriminate nonsense shapes (see below). There were 5 boys, 3 girls, mean age 10:2 (9:5 to 11:1) 5 
boys, 3 girls, mean age 8:9 (8:2 to 9:3) and 4 boys, 4 girls, mean age 7:6 (6:10 to 7:11). Means and ranges of 
IQ scores (Williams Test for the blind) for the three groups were comparable 107 (90 to 140), 104 (90 to 130), 
107 (94 to 150), respectively. Forward mean digit spans and ranges for the three age groups were 6-0 (5 to 7), 
5-0 (4 to 6) and 5-4 (4 to 7) respectively. Backward mean spans and ranges were 3-5 (2 to 5), 2-9 (0 to 5), 

and 2-3 (0 to 4) respectively. 


Set-size groups 
Subjects were divided into three set-size geoups on the basis of each subject’s tactual recall span on 
ungrouped letters in the main test. A subject’s recall span or set size was taken to be that size of list of 
ungrouped letters on which he scored above chance level (60 to 100 per cent correct) before failing (50 per 
cent or less correct). Recall spans or set size on ungrouped letters were taken as the basis of comparison to 
obviate floor and ceiling effects. Subjects were tested on list lengths from two through to ten items, or up to 
and including those lengths of list on which they failed (50 per cent or less correct). Set size was that length 
of list of ungrouped letters prior to this. 

Group | were subjects testable on set sizes of six or more; mean age, 9:6; group 2 were subjects testable 
on set sizes of four or five items, mean age, 8:10; group 3 were subjects testable on set sizes of two or three 
items, mean age, 8:3. Each of the three set-size groups consisted of eight subjects. 


List type 

There were two types of lists: letter lists consisted of new random selection for every trial from ten Braille 
consonants previously found to be least confusable with each other tactually and phonologically (K, L, M, 
S, R, G, N, T, H, C); nonsense shapes lists condisted of new random selections for every trial from ten 
(simplified) Gibson shapes, produced as dot patterns by means of a standard Braille stylus on (2x2-5 cm) 
Braille paper squares. Each shape was fixed to a blank plaque from the ‘Unilock Word Building Device’ set 
of individual (2x2-5 cm) plaques. The Braille letters were standard raised dot letters on ‘Unilock Word 
Building Device’ individual (2x2-5 cm) plaques. The plaques can be interlocked in any combinations for easy 
presentation of a series. 


Grouping conditions 


The ‘Unilock Word Building Device’ tray was used for presentation. The tray has (24x2-5 cm) horizontal 
compartments in which interlocked on separate letters can be lodged securely. For ungrouped conditions the 
plaques bearing the memory list were interlocked (2 cm distance between all adjacent shapes). For grouped 
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conditions, grouped shapes were interlocked, and the distance between different groups of shapes was 2:5 
cm (4-5 cm between the last shape of the first and the first shape of the second group), (2+1 and 1+1, for 
three and two item lists.) 


Procedure and scoring 


Subjects were tested singly in a quiet room of their school. Presentation was tactual, serial and self-paced. 
Subjects explored the appropriate memory list lodged on one compartment of the tray (see above), 
sequentially with the preferred finger of their preferred hand. Immediate tactually probed position recall was 
used as in Millar (1975 a): A test (probe) item, duplicating one in the memory set, was placed at one end of 
the compartment directly beneath the memory series. Subjects felt this item immediately on finishing the 
memory set, and were to place it on the plaque bearing the same item in the row above. While subjects were 
feeling the test letter, the experimenter turned the memory series on its face, so that each item remained in 
the same position, but only the smooth backs of the plaques were available for inspection during recall 
Serial positions were tested equally in random order for every subject, list length and list type. Each subject 
was tested in all four combinations of list type and grouping conditions. Conditions were blocked in across 
subject counterbalanced order. Subjects were informed of the conditions prior to each block of runs. 

In order to retain the maximum amount of information, scores were based on proportions (subjected to 
arcsine transforms for the Anovas) of correct responses on every list length for each of the four 
list-type/grouping conditions. For example, a subject who scored 1-0 (100 per cent correct) on every list 
length from two up to and including five items, and 0-67 on a list length of six, 0-50 on a list length of 
seven, and 0-15 on a list length of eight had a total raw score of 5-32 for that condition. It should be noted 
that since the maximum score on each list length was 1-0 from a list length of two items upwards, a 
subject’s raw score was, of course, lower than his set size or recall span. For instance, a subject who scored 
as shown above on ungrouped letters, would have a recall span or set size of six items. 


Pre-test procedures 

Prior to tests, forward and backward (oral) digit spans were ascertained in the manner of Binet tests. This 
was followed by letter naming and nonsense shape discrimination tests to rule out difficulties in perceptual 
discrimination (for the same reason, presentation was self-paced). 

Letter-naming criteria were two errorless runs on naming each letter in random order. Four subjects who 
made either one or two mistakes in the first run were given training (feedback) trials of up to four runs, and 
two further criterion (no feedback) runs. Only one subject failed to reach criterion (two errorless runs) and 
was excluded. Latencies were recorded by hand on a TC 12 timer. 

Criteria for reliable discrimination of nonsense shapes were two errorless runs on same/different 
judgements of all combinations of two (same or different) interlocked shapes. Equal numbers of 
same/different combinations were presented in Gellerman-type order. Different shapes were chosen in 
random order from the ten shapes. Errors even on the first run were surprisingly low (5-75 per cent overall). 
Subjects who made any mistakes were given training (feedback) runs, followed by two criterion runs. AH 
subjects reached criterion (two errorless runs). The main tests followed after a short break. 


Results 


Mean (raw) recall scores are graphed in Fig. 1. A set sizexlist type x grouping Anova on (arcsine 
transforms of proportion of) correct scores showed highly significant effects of set size 
(F = 24-62, d.f. =2, 21, P< 0-001); list type (F = 304-39, d.f. = 1, 21, P< 0-001); and an 
interaction between these (F= 35-39, d.f. =2, 21, P< 0-001). This meant that subjects with larger 
recall spans on ungrouped letters achieved higher scores than subjects testable on smaller set 
sizes; that letter lists were recalled better than nonsense shape lists, and that the difference 
between set-size groups was due more to letter lists (simple effect, P< 0-001) than to nonsense 
shape lists (simple effect P< 0-05). Thus the same subjects whose recall spans on ungrouped 
letters were high, scored little more on nonsense shapes than subjects whose recall spans on 
ungrouped letters was low. 

The most important finding in relation to the hypothesis was the highly significant interaction 
between list type and grouping (F = 18-73, d.f. = 1, 21, P< 0-001), and a smaller but significant 
higher order interaction between set size, list type and grouping (F = 4-45, d.f. =2, 21, P< 0-05). 
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Figure 1. Mean (raw) scores correct for ungrouped (U) and grouped (G) Braille letter and norsense shape 
recall by subjects with set sizes of 6 or more (set size 6), 4 and 5 (set size 4 and 5) and 2 and 3 (set size 2 
and 3) on ungrouped letters. 


Separate analyses showed that for letter lists grouping significantly improved scores (F = 7-54, 
d.f. = 1, 21, P< 0-025). Set size had a highly significant effect (F= 69-33, d.f. = 1, 21, P< 0-001), 
and this interacted with grouping (F= 4-96, d.f. =2, 21, P< 0-025). This meant that grouping 
improved letter recall by subjects with recall spans of six or more items (P< 0-05) and four or 
five items (P< 0-01), but not for subjects with recall spans of two or three items. For these 
subjects it was indeed, although not quite significantly, in the opposite (negative) direction. By 
contrast, for nonsense shapes, grouping had a significantly adverse effect (F= 10-56, d.f. = 1, 21, 
P<0-01), a reduced effect of set size (F= 3-36, d.f. =2, 21, P< 0-05), and there was no 
interaction between grouping and set size (F less than 1). Thus grouping had a detrimental effect 
on nonsense shape recall, regardless of set size or recall span. The same results for nonsense 
shapes were obtained when subjects, instead of being divided on the basis of their set sizes on 
ungrouped letters, were divided on the basis of their set sizes on ungrouped nonsense shapes. 


Grouping effects and individual difference variables 


The three-way interaction found above was due to group 3 and this raises questions about 
subject characteristics. Thus, age or ability may limit not only recall efficiency (set size and 
scores), but might also determine effects of grouping. Since the same subjects (group 1 and 2) 
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who benefited from letter grouping, not only scored considerably worse on nonsense shapes, but 
also showed recall decrements when these items were grouped, subject characteristics could not 
be the whole explanation. The type of code elicited by the shapes rather than subjects’ ability 
would seem to be implicated. But subject characteristics might have affected the three way 
interaction, for instance if subjects in Group 3 were either only the youngest or least able, or 
had differential deficits also on backward digit recall, which presumably also requires some form 
of attentive or maintenance control process. If so, shape type and grouping effects should relate 
to chronological or to mental age in the same way, or possibly more strongly, as to set size, and 
backward digit spans should relate differentially to set size. Separate analyses were therefore 
carried out to test the relation of chronological and of mental age to list type and grouping, and 
of forward and backward digit spans to set size. Table 1 shows the subject characteristics of the 
three set-size groups. 


Table 1. Subject characteristics of children with set sizes of 6 or more, 4 or 5, and 2 or 3 items 
on ungrouped letters 


Set size 
Characteristic 6-9 4+5 2+3 
Mean chronological age 9:6 8:10 8:3 
Mean IQ 115 103 99 
Mean mental age 10:10 8:11 8:2 
Mean forward digit span 6-3 5-4 4:8 
Mean backward digit span 4-0 3-0 1-6 
Mean letter recall 6-1 3-9 2-1 
(ungrouped) score 
Mean letter recall 65 4-9 1-8 
(grouped) score 
Mean nonsense shape 3-1 2:0 1-3 
(ungrouped) recall score 
Mean nonsense shape 2:7 1-4 1-0 
(grouped) recall score 
Mean letter recognition 1:4 1-6 3-3 


latency (pre-test) (sec) 


Chronological age 


An Anova (on arcsine transforms of proportions) on age, list type and grouping showed that age 
(F=5-01, d.f. =2, 21, P< 0-05) was significant, although it had a much smaller effect than that 
of set size found above. More importantly, age did not interact with any other factor. Thus 
although the other effects of list type (F = 72-20, d.f. = 1, 21, P< 0-001), and the interaction of 
list type x grouping (F= 12-68, d.f. = 1, 21, P< 0-01) were as expected from the previous 
analysis, age, unlike set size, interacted neither with grouping nor with list type and grouping. 
There are no grounds for supposing that chronological age as such was a significant factor 
contributing to the set size effect in the three-way interaction found above, or to the differential 
effects of grouping. 


Mental age 


Subjects were divided into three groups on mental age (calculated from IQ scores) and this was 
the between-subject factor in an Anova on arcsine transforms of proportions correct. On letter 
shapes, mental age (F= 12-94, d.f. =2, 21, P< 0-001) was a highly significant main effect, and 


gre 


22 Susanna Millar 


was thus clearly associated with the level of letter recall. Grouping (F= 5-26, d.f. = 1, 21, 

P< 0-05) significantly increased scores on letter shapes, but did not interact with mental age 
(F= 0). There was thus no indication that mental age related to letter grouping in the same way 
as set size. In a similar Anova for nonsense shapes, mental age (F = 5-24, d.f. =2, 21, P< 0-025) 
was also significant, although at a somewhat lower level than for letter recall. Grouping 

(F= 10-34, d.f. =1, 21, P< 0-01) significantly reduced scores on nonsense shapes as expected 
from previous analyses, and did not interact with mental age. 


Digit spans 

Digit spans were significantly correlated with recall scores of ungrouped letters (r= 9-81 for 
forward, r= 0-67 for backward spans, both significant beyond 0-01 level on product moment 
correlations). However, an Anova on digit spans in terms of the three set-size groups, produced 
only the expected effect of set size (P< 0-005), and a highly significant effect showing that 
forward digit spans were higher than backward digit spans (F = 93-00, d.f. = 1, 21, P< 0-001). 
But there was no interaction between set size and backward digit spans, and thus no indication 
that subjects with the smallest set sizes were differentially worse on backward digits. 

The pattern of scores shown in Table 1 which most closely parallels the division between 
those who benefited from letter grouping (groups 1 and 2) and those who did not (group 3) is that 
of letter naming latencies. Mean letter naming latencies on pre-test for those with set sizes of 
two or three items were more than twice the mean naming latencies for subjects with medium or 
larger set sizes whose pre-tests naming was fast. This suggests an association of naming speed 
with facilitation of letter recall under grouping. 


Discussion 

The results were quite clear. Grouping had a detrimental effect on the recall of nonsense shapes, 
but facilitated the recall of tactual letters but only by subjects with larger recall spans who were 
faster namers. This difference in the effects of a known variable is consistent with the 
hypothesis that processes underlying memory for tactual inputs that are difficult to name differ 
from processes determining recall of easily named material. 

Adverse effects of grouping nonsense shapes had been predicted on the hypothesis that 
memory for the feel of shapes is less subject to attentional control and decays or is degraded 
more easily than verbally coded material. Clearly nonsense shapes are less familiar as well as 
less easily named, although subjects here had been trained on these. Discriminability as such was 
not a problem in view of pre-test criteria (see Method) and the self-paced presentation. But it 
might be argued that subjects also coded nonsense shapes verbally; only less efficiently. Longer 
time, or more effort spent in registering or verbally recoding less familiar items, might pre-empt 
limited capacity and so impair registration of later items, and could also reduce ‘rehearsal’. 
However, this argument would at best account only for overall lower recall scores and reduced 
effects of grouping. It would not explain why grouping should actually impair recall by subjects 
who could demonstrably benefit from grouping when shapes were letters. These data fit best 
with the assumption that subjects coded nonsense shapes as felt (since they were blind) shapes, 
and that this form of tactual registration either decays with the extra time, or is degraded by 
haptic movements across the extra spaces, imposed by grouping. It could not be identified with 
verbal short-term memory where the same type of presentation served to enhance recall, readily 
explained in terms of attentional controls. Intuitively, it seems reasonable that the feel of shapes 
can be remembered albeit less efficiently, but cannot be reiterated or controlled in the same Way 
as names. More importantly, the data fit with the growing number of findings that tactual 
characteristics can be used in memory, but that this behaves differently from verbal coding (see 
introduction), as well as with the evidence that features other than verbal codes survive very 
brief peripheral after-effects (Murdock, 1974). Whether this requires the assumption of separate 
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modality-specific stores (Wallach & Averbach, 1955), or is better explained as storing perceptible 
features in an ‘episodic’ system (Tulving, 1972) or in terms of a post-categorical buffer that 
preserves characteristics of sensory registration (Broadbent, 1971) is a separate question. The 
data here do suggest, however, that the processes underlying short-term retention vary with the 
forms of encoding, as argued by Levy & Craik (1975), and that this is true also of encoding 
tactual shapes. 

The improved recall found here for grouped over ungrouped tactual letters is consonant with 
well-established findings for verbal material presented in other (auditory or visual) modalities. 
The interaction with set size, due to group 3 subjects, had been expected on the basis of 
previous results showing that subjects with low recall spans were affected by tactual rather than 
by phonological similarity in the material (Millar, 1975 a, b), suggesting that they relied on tactual 
features in memory. The result by group 3 subjects could not be explained in terms of individual 
differences in ‘rehearsal’ capacity. Their mental ages were well beyond that presumed to 
preclude this (Conrad, 1971); they were not differentially penalized on backward digit spans 
which presumably involved attentional control; and neither mental nor chronological age, 
although associated with better recall, related differentially to letter grouping. Instead, there 
were indications in the present study which parallel a previous finding (Millar, 1975 a) that group 
3 had much longer pre-test letter naming latencies than subjects who benefited from letter 
grouping. Naming speed could affect recall in two ways. For instance, if group 3 subjects relied 
on letter names, verbal recoding would be slow and this could affect total recall and reduce the 
possibility of rehearsal (see above). However, this might be expected to relate to mental and 
chronological age, and should also produce reduced rather than adverse effects with grouping. 
Alternatively, naming speed could affect the choice of code. If naming is slow, subjects might 
instead treat letters as shapes and rely on tactual features in memory. There is no reason why 
relying on tactual shape rather than on names should relate directly to chronological or mental 
age. But on the evidence from nonsense shapes, this type of recall is relatively poor and 
grouping is detrimental to it. The negative tendency for grouped letter recall by group 3 subjects 
is thus more consistent with the interpretation that group 3 treated letters as shapes rather than 
as names. The role of recoding speed in recall merits further study. But the main point implied 
by the data on nonsense shapes is that children’s recall efficiency and the type of processing can 
vary with the features that are encoded. 

The results have some implications for memory development. Firstly, set size for the same 
subjects varied widely with the material and with presentation conditions. This makes it 
extremely unlikely that immediate recall spans can be taken as direct valid indications of 
‘physiologically based limits’ at different ages as is sometimes suggested. Secondly, at least 
some of the limits on recall attributed to ‘mediational’ (Kendler & Kendler, 1962), or 
‘production’ (Flavell, 1970) difficulties, or to deficiency in rehearsal or control strategies (Hagen, 
1972) could result from relying on codes (either because the presentation elicits them, or 
alternatives are not looked for, or not available, or recoding is slow) that do not easily lend 
themselves to attentional controls. 

In summary, the hypothesis that processes underlying memory for tactual shape differ from 
those underlying memory for verbally coded material was tested by comparing blind children’s 
recall of tactual nonsense and letter shapes under conditions of grouped and ungrouped 
presentation. Results showed that grouping significantly impaired the recall of nonsense shapes, 
while letter recall was better, and improved significantly with grouping by subjects with larger 
recall spans whose pre-test letter naming was relatively fast. For subjects with small letter spans 
whose pre-test letter naming was slow, letter recall did not differ from nonsense shape recall. 
The findings supported the hypothesis, and were consistent with the assumption that processes 
underlying recall of tactual shapes are subject to decay or to degrading rather than to attentional 
controls. It was argued that the form of coding may be one factor in determining the nature of 
processing and recall efficiency. 
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Golden section relations in interpersonal judgement 


John Benjafield and T. R. G. Green 





A model of the organization of interpersonal judgements, based on the hypothesis that people tend to 
organize their judgements in Golden Section ratios, was presented. A theory of the process of interpersonal 
judgement, based on the notion that people judge acquaintances using a Fibonacci-like decision rule, was 
then developed. A computer simulation of the theory yielded results consistent with the model. An 
experiment in which subjects judged a variety of sets of acquaintances also yielded results consistent with 
the model. 





The golden section ratio (g) is the proportion obtaining between two quantities a and b when 
a/b = b{(a+b). In order for this ratio to occur, a must be approximately 0-618 of b. Many 
investigators have claimed that objects which display this ratio have a greater aesthetic value 
than those which do not (e.g. Le Corbusier, 1954; Huntley, 1970). In earlier times, the golden 
section was the subject of mystical speculation (Gardner, 1966). However, over the last century, 
a large number of experimental investigations have been devoted to determining the 
psychological properties of the golden section (Zusne, 1970). Fechner (1876) provided the first 
experimental evidence that the golden section is the most pleasing proportion, and current 
research has tended to confirm Fechner’s findings (e.g. Benjafield, 1976; Segalowitz & 
Benjafield, 1976). Recently, Benjafield & Adams-Webber (1976) have suggested that the golden 
section may play an important role in interpersonal as well as in aesthetic judgement. 


The golden section hypothesis 


It is a commonplace in studies of interpersonal judgement to have subjects judge acquaintances 
in terms of bipolar dimensions (e.g. pleasant-unpleasant). Osgood & Richards (1973) have 
pointed out that, typically, one of the poles of a dimension is psychologically positive, and the 
other psychologically negative. Osgood & Richards (1973, p. 380) conceive of this distinction in 
terms of the ancient Chinese concepts of Yang and Yin: ‘The underlying polarity of Yang and 
Yin. . .begins with light vs. dark and extends. . .into high vs. low, creative vs. receptive, firm 
vs. yielding, moving vs. resting and masculine vs. feminine. ... This polarity is not simply 
evaluative; it is rather a polarity between two global forces which can only be termed the 
positive and the negative.’ Thus, Osgood & Richards (1973, p. 381) conceive of the positive vs. 
negative distinction as more fundamental than any of the three semantic differential factors — 
Evaluation (E), Potency (P) and Activity (A): ‘Strong and active, as well as good, are somehow 
psychologically positive as compared with their opposites.” Osgood & Richards’ (1973) usage 
of the words ‘positive’ and ‘negative’ implies not only that adjectives which are predominantly 
E+ are ‘positive’ and their E— opposites ‘negative’, but also that adjectives which are 
predominantly P+ and A+ are ‘positive’ and their opposites ‘negative’. This usage of ‘positive’ 
and ‘negative’ has been adopted throughout this paper. 

Most dimensions used in studies of interpersonal judgement are ‘maldistributed’ (Bannister & 
Mair, 1968) in favour of their positive poles (Adams-Webber & Benjafield, 1973). That is, if 
P= the number of events subsumed by the positive pole of a dimension, and N= the number of 
events subsumed by the negative pole, then P/(P+N) tends to be greater than 0-50 (Benjafield 
& Adams-Webber, 1975). The golden section hypothesis (Benjafield & Adams-Webber, 1976) 
implies that P/(P+ N) = ¢, on the average. They presented data from five experiments employing 
Kelly’s (1955) repertory grid technique. In these studies, subjects allotted acquaintances to the 
positive or negative pole of a representative sample of semantic differential dimensions 
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(Osgood, Suci & Tannenbaum, 1957; Warr & Coffman, 1970). In all five experiments, the mean 
P/(P+N) score was within 0-01 of the golden section ratio. 

It is important to place these results in perspective by noting that Benjafield & Adams-Webber 
(1975) found significant individual differences in maldistribution scores, and related these to 
individual differences in cognitive structure. Furthermore, Benjafield, Jordan & Pomeroy’s (1976) 
findings suggest that an individual’s P/(P+N) ratio may change markedly as his situation 
changes — for example, it may temporarily increase in value as a result of exposure to an 
encounter group. Nevertheless, the fact that mean P/(P+N) scores reliably tend toward the 
golden section value suggests that this ratio constitutes an optimal state for interpersonal 
judgements. 

Benjafield & Adams-Webber (1976, p. 13) suggested that the golden section may be an optimal 
state for interpersonal judgements because it allows negative events to be maximally striking. 
Following a suggestion of Berlyne’s (1971, p. 232), the golden section may be a figure-ground 
relation in which the minor element is the figure. Thus, people tend to organize their 
interpersonal judgements so that negative events are seen as figure against a background of 
positive events. We are not arguing that everyone organizes their judgements so as to make 
negative events maximally striking, but rather that there is a general tendency for people 
to do so. 

It is often suggested that negative adjectives label ‘deviant’ or ‘atypical’ events (Deese, 1973; 
Zajonc, 1968). Making deviant events stand out as figure against a generally positive 
background can be seen as an adaptive way of organizing interpersonal judgements. If the 
person arranges his judgements in the golden section ratic, then he will be able to attend to 
those events which are likely to give him trouble. 

This explanation of Benjafield & Adams-Webber’s (1976) results is also consistent with 
evidence that atypical events often attract more attention than do ‘normal’ ones (Berlyne, 1960; 
Berlyne & Ditkofsky, 1976). Events which are inconsistent with the person’s expectations are 
more likely to be attended to than are those which the person expects to occur as a matter of 
routine. This observation leads to some detailed predictions concerning how interpersonal 
judgements should be organized. 


The organization of interpersonal judgements 


The golden section hypothesis holds that ‘whenever people differentiate one thing into two, they 
tend to do so in a way that approximates the golden section’. Benjafield & Adams-Webber 
(1976) presented data supporting this hypothesis based only on the adjectives people use to 
describe their acquaintances. However, the golden section hypothesis should apply to the 
acquaintances described, as well as to the adjectives used to describe them. That is, people 
should tend to divide their acquaintances into a class of ‘typical’ acquaintances, and a class of 
‘atypical’ acquaintances. A typical acquaintance is here cefined as an acquaintance to whom the 
person assigns a majority of positive adjectives. The reason for so defining them is as follows: 
Since most things are described positively most of the time (Osgood & Richards, 1973; 
Benjafield & Adams-Webber, 1975), most acquantances will be assigned a majority of positive 
adjectives. All acquaintances not assigned a majority of positive adjectives, including those who 
receive equal numbers of positive and negative adjectives, are here defined as atypical. One 
hypothesis to be tested in the present study is that 61-8 per cent of a person’s acquaintances will 
be typical acquaintances. 

At this stage we are hypothesizing two golden section relations: One between positive and 
negative adjectives, and one between typical and atypical acquaintances. These hypotheses can 
be tested in a repertory grid task by determining the proportion of typical acquaintances 
occurring in the grid. In addition, the repertory grid format allows us to specify the relation 
between these two hypotheses. If, on the average, 61-8 per cent of a person’s acquaintances are 
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typical, then 61-8 per cent of all adjectives used in the grid must be used to describe typical 
acquaintances. 

To illustrate this last point, consider a repertory grid task in which the subject allots 21 
acquaintances to either the positive or the negative poles of 21 dimensions. Ex hypothesi, 13 (or 
62 per cent) of these acquaintances will be typical. This means that out of 441 (21x21) 
adjectives used in the grid, 273 (13x21) will be used to describe these typical acquaintances. Of 
course, 273/441 = 62 per cent. 

We can now specify the ratio of positive to negative adjectives used for typical acquaintances, 
and the ratio of positive to negative adjectives used for atypical acquaintances. The deductions 
which follow are based on the assumption that people organize their interpersonal judgements so 
that ‘deviant’ events are made maximally striking. In practice, what the person regards as 
deviant will vary from context to context. While negative adjectives generally describe deviant 
events, consider the case in which the person expects an acquaintance to behave in negative 
ways. When such an acquaintance behaves in positive ways, then that acquaintance is behaving 
in a deviant fashion, i.e. contrary to the way the person expects him to behave. Thus, the 
positive characteristics of atypical acquaintances are their abnormal, or deviant, features. 
Therefore, when a person considers only atypical acquaintances, then their positive features 
should be their most striking characteristics. Concretely, this means that 38-2 per cent of the 
adjectives used to describe atypical acquaintances should be positive adjectives. 

All of these considerations combine to yield the predicted values given in Table 1. The key to 
understanding the percentages given there lies in realizing that if 38-2 per cent of all the 
adjectives used to describe atypical acquaintances are positive adjectives, then 14-6 per cent 
(38-2 per cent of 38-2 per cent) of the adjectives used in the entire grid will be positive 
adjectives assigned to atypical acquaintances. Given that value of 14-6 per cent, the rest of the 
cells in Table 1 are determined. 


Table 1. Percentage of adjectives predicted for each type of acquaintance. (Observed values are 
in parentheses, and simulated values are in italics.) 


Acquaintances 
Adjectives Source Typical Atypical IX 
Positive Predicted 47-2 14-6 61-8 
Observed (48-3) (13-4) (61:8) 
Sa 46-0 15-0 61-0 
Ss 44-6 14:7 59-4 
Negative Predicted 14-6 23-6 38-2 
Observed (14-5) (23-7) (38-2) 
Sa 15-8 23-1 39-0 
Sp 14:2 26-4 40-6 
IX Predicted 61:8 38-2 
Observed (62:9) (37-1) 
Sa 61-8 38-2 
Sp 58-8 41-2 


The predicted values in Table 1 have five golden section properties: 


(1) A golden section relation obtains between the proportion of positive adjectives used (61-8 
per cent) and the proportion of negative adjectives used (38-2 per cent). Data supporting this 
prediction have already been presented by Benjafield & Adams-Webber (1976). 
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(2) A golden section relation also obtains between the number of adjectives used to describe 
typical acquaintances (61-8 per cent) and the number of adjectives used to describe atypical 
acquaintances (38-2 per cent). This suggests that atypical acquaintances will stand out as figure 
against a background of typical acquaintances. 

(3) The positive features of atypical acquaintances are figure against a background of their 
negative features. The former constitute 14-6 per cent, the latter 23-6 per cent of the total. This 
is a golden section relation as well, since 14-6%/(14-6%+23-6%) = 38-2%. The reason for 
positing this relationship was stated previously. 

(4) Since the negative features of typical acquaintances also constitute 14-6 per cent of the 
total, they also stand out when compared with the negative features of atypical acquaintances. 
The model thus suggests that when we compare the negative features of typical acquaintances 
with the negative features of atypical acquaintances, then we pay more attention to the former. 

(5) The relationships given in (3) and (4) are those which allow the person to ‘highlight’ the 
deviant features of atypical and typical acquaintances in turn. The model also allows the person 
to simultaneously contrast both sets of deviant features with the positive features of typical 
acquaintances, since (14-6+ 14-6)/(14-6+ 14-6+47-2) = 38:2%. 

Properties (4) and (5) of the model are logically necessary consequences of properties (1) to 
(3). It is an attractive feature of golden section ratios that they can be nested within one another 
in this fashion. In this respect, the model is similar to one of Le Corbusier’s (1954, p. 92) 
‘panels’, where one divides a square into sections which bear golden section relations to one 
another. These sections can be combined in various ways and ‘a prodigious wealth of 
harmonious combinations is obtained’ (Le Corbusier, 1954, p. 96). In the case of the model 
presented here, the ‘harmonious combinations’ constitute the ways in which the person arranges 
his judgements so as to make their different classes bear golden section relations to one 
another. 


The Fibonacci Decision Rule 


Thus far, we have only presented a model which attempts to answer the question ‘How are 
interpersonal judgements organized?’. It would heighten the plausibility of the model if we could 
also provide an answer to the question ‘what procedure do subjects use in order to achieve 
judgements organized in golden section ratios?’. In this section we will specify a hypothetical 
process which, if used by subjects, would yield judgements in the ratios given in Table 1. 

From the point of view presented in this paper, a repertory grid task often requires subjects to 
categorize a number of acquaintances as either positive or negative on some dimension. For 
convenience, take this number to be 13; we shall show later what happens when it is another 
number. To start with, assume that subjects rank order ecquaintances on the dimension. This 
assumption is consistent with everyday experience — we are obviously capable of making ordinal 
comparisons between acquaintances, e.g. Harry is taller than Nathan — and considerable 
research has been devoted to this phenomenon (e.g. De Soto, London & Handel, 1965). The 
positive and negative labels can be assigned by working through the following steps, during 
which the labels are always assigned to the acquaintance with the most extreme rank who is so 
far unlabelled. 

Step 1: assign a P label. 

Step 2: assign a N label. 

Step 3: assign a P label. 

Step 4 and all subsequent steps: count the number of P labels assigned in the last two steps, 
and assign that many again. Do the same for N labels. (Thus, the assignments are: 

Step 4-P, N; Step 5 - 2P, N; Step 6 - 3P, 2N.) 

After 6 steps, the 13 acquaintances have all been labelled, eight with a P label, and five with 
an N label. The process just described is illustrated in Table 2. 
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Table 2. An application of the Fibonacci decision rule to a set of acquaintances. (The italicized 
labels are those added at each step) 





Rank of acquaintance 


Step 1 2 3 4 5 6 7 8 9 10 11 12 13 IXP EN XP+N) 





Soo OU fy 
Sd oy 
Sd ty 

hy 

~My 

Z Z Z 
ZZ Z Z 2 
OO LNA U MNO ee pa 
UN U N ee eee OO 
eri ee 


N 
P P P N N N 


Dw te W ba e 
— 


The interesting property of this procedure is that the total number of N labels, the total 
number of P labels and the total number of labels of either sort are always, at any stage of the 
process, three successive terms of the Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, .... Each 
further term in this sequence is the sum of the two previous terms. Moreover, the ratio between 
any two consecutive terms of the sequence is an approximation to ¢, an approximation which 
improves as the numbers increase (Voroboyov, 1963). Thus, this is a simple process which, if 
used by subjects in a repertory grid task, would tend to produce two classes of acquaintances in 
the golden section ratio. 

We have illustrated the process using 13 acquaintances, but what happens with 12, or any 
other number not found in the Fibonacci sequence? We suggest that the labels would simply 
overlap, so that some acquaintances are labelled twice. For example, when the number of 
acquaintances is 12, the acquaintance ranked eighth would receive both a P and an N label. On 
a rating scale, such ‘ambiguous’ acquaintances would no doubt be placed in the neutral 
category. However, in a task requiring dichotomous judgements, the subject probably assigns 
them sometimes to one pole, and sometimes to the other. 

We have shown that if subjects tend to use the Fibonacci Decision Rule in a repertory grid 
task, then they will tend to achieve grids containing 61-8 per cent positive adjectives -that is, 
grids consistent with property (1) of the model outlined on p. 27. What we have not yet shown is 
that the use of the Fibonacci Decision Rule will also tend to yield grids that possess properties 
(2)-(5) of the model. In order to show this, we will present a computer simulation of repertory 
grid performance using the Fibonacci Decision Rule. However, it would be idle to present such a 
simulation without also presenting evidence that the predicted values of Table | are themselves 
consistent with experimental data. We will give this experimental evidence first, followed by the 
simulation. 


Experimental test of the model 


In everyday life we are not always confronted with a balanced or representative set of 
acquaintances, and at different times may find ourselves in the company of sets of individuals 
who vary widely in the extent to which they are typical or atypical. The following experiment 
was designed to give subjects the opportunity to construe different sets of acquaintances which 
cover a broad range in terms of the number of positive adjectives likely to be assigned to them. 

The ideas presented in this paper deal with the perception of people other than oneself. A 
theory of self-perception requires additional analysis beyond the scope of the present study, 
perhaps along the lines suggested by Hauser & Shapiro (1973). However, since it is common to 
present the self as a ‘figure’ in repertory grid tasks, we have included it in our experiment for 
the sake of completeness. This procedure enables us to compare the self with other acquaintances 
in terms of the number of positive adjectives typically assigned to it. 
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Method 
Subjects 


Forty Sheffield University undergraduate volunteers served as subjects. Five men and five women were 
assigned to each of the four treatment conditions described below. 


Task 


Twelve role-titles were used to elicit the names of 12 acquaintances from each subject individually. The 
name of each acquaintance was recorded on a separate card, as was the subject’s own name. The role-titles 
were as follows: (1) The most successful person known to you personally; (2) The happiest person known to 
you personally; (3) The most ethical person known to you personally; (4) The warmest person known to you 
personally; (5) The strongest person known to you personally; (6: The most active person known to you 
personally; (1a) The most unsuccessful person known to you personally; (2a) The unhappies: person known 
to you personally; (3 a) The most unethical person known to you personally; (4a) The coldest person known 
to you personally; (5a) The weakest person known to you personally; (6a) The most passive person known 
to you personally. 

Role-titles 1-3 are Kelly’s (1955) ‘Value’ roles; while 4-6 and 4a-6a were used previously by Benjafield & 
Adams-Webber (1976). Once subjects had assigned a name to a role-title, they were not allowed to repeat it. 

The definition of each of role-titles 1-6 contains a positive adjective, while role-titles 1 a-6a are each 
defined by adjectives which are the negative opposites of role-tites 1-6. (The terms ‘positive’ and ‘negative’ 
are here being used in the way outlined previously.) These role-titles were used in order to elicit from the 
subject a broad range of his acquaintances. 

The subject was then asked to construe six of these acquaintances on 12 dimensions, as follows. (The 
method for selecting the six acquaintances will be described in the next section.) The cards containing the 
names of the six acquaintances, plus the self, were shuffled thoroughly and handed to the subject who was 
asked to sort them into two piles according to whether they were, for example, strong or weak. All 
acquaintances designated as strong (and other positive terms) were scored ‘1’, and those designated as weak 
(negative) were scored ‘0’, so that the results of subjects’ sorting could be recorded as a single six-digit 
profile. (This score is similar to Bannister & Mair’s (1968) maldistribution score.) The self was scored in the 
same fashion. Then the subject was presented with the next dimension (e.g. pleasant-unpleasant), the 
cards were reshuffled and he was asked to sort the same seven cards (six acquaintances plus the self) again 
on the basis of that dimension. This procedure was repeated until the subject had sorted the seven cards 
successively on 12 dimensions. Each dimension was written on a separate card. All 12 cards were shuffled 
thoroughly before they were presented to the subject. The dimensions used were the same as those used by 
Warr & Coffman (1970), as well as by Benjafield & Adams-Webder (1976): (1) generous—mean; (2) pleasant- 
unpleasant; (3) true-false; (4) fair-unfair, (5) active-passive; (6) energetic-lethargic; (7) sharp-dull, (8) 
excitable-calm; (9) strong—-weak; (10) bold-timid; (11) hard-soft; (12) rugged-delicate. Each cf the three 
semantic differential components of connotative meaning (Osgood ef al. 1957) is represented by four 
dimensions: Evaluation (1-4); Activity (5-8); and Potency (9-12). For every dimension, the first adjective is 
positive, the second negative (Osgood & Richards, 1973; Benjafield & Adams-Webber, 1976). (Once again 
the terms ‘positive’ and ‘negative’ are being used in the way outlined on p. 25.) 


Experimental design and procedure 


The task just described was administered seven times (trials) to each subject in each of four treatment 
conditions. Twenty subjects received acquaintances 1-6 plus the self on Trial 1, and the other 20 received 
acquaintances 1 a-6a plus the self on Trial 1. For ten of the 20 subjects in each of these two groups, the 
acquaintance receiving the most positive adjectives was then deleted. This acquaintance was determined 
separately for each subject. In the case of ties the decision was made by a coin toss. In place of the 
acquaintance receiving the most positive adjectives was substitcted its opposite. If the subject had received 
acquaintances 1-6 on Trial 1, then the opposite acquaintance was drawn from acquaintances 1 a—6a. For 
example, if ‘happiest person’ (acquaintance 2) received the most positive adjectives on Trial 1, then it was 
deleted, and ‘unhappiest person’ (acquaintance 2a) substituted for it. Similarly, if the person had received 
acquaintances | a-6a on Trial 1, then the opposite acquaintance was drawn from acquaintances 1-6. The new 
set of six-acquaintances-plus-the-self thus derived was then construed by the subject on Trial 2, following 
the same procedure as before. The subject was asked to do the task as if ‘it was the first time you had done 
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it’. After Trial 2, the acquaintance receiving the most positive adjectives was deleted, and its opposite 
inserted. The sole constraint at this stage was that the acquaintance deleted must be one of the original six 
acquaintances given to the subject on Trial 1. This procedure was repeated for a total of seven trials. Thus, 
on Trial 7, the subject was construing the opposites of the six he construed on Trial 1. 

The remaining 20 subjects received an identical procedure except that after each trial the acquaintance 
receiving the fewest positive adjectives was deleted, and its opposite substituted for it. 


Results and discussion 


The total number of positive adjectives used by each subject to construe the six acquaintances 
on each trial was calculated. (Thus, these scores do not include adjectives assigned to the self. 
Data for the self will be presented later.) The means of these scores, expressed as the 
percentage of positive adjectives used on each trial, are given in Table 3. 


Table 3. Percentage of positive adjectives used in each condition. (Simulated results, S, and Sp, 
are given in italics) 


Acquaint- Trials 

ances on Acquaintance re 

Trial! deleted 1 2 3 4 5 6 7 X S.D. 
1-6 Most E 671 61-1 574 542 510 521 519 564 il 


positive Sa, 744 679 593 S40 499 472 48:6 573 Ill 


1-6 Least E B6 72:5 74 69:2 661 579 536 64 09 
positive Sa 73-1 763 715 C9 614 S68 49 650 10 


la-6a Most E 522 511 528 542 606 694 772 596 I x 
positive Sa 488 504 532 586 639 710 735 99 l 

K r 

laa Least E 468 557 628 69:0 740 BS 71:0 67 12 
positive Sy, 468 550 596 625 649 700 746 619 Il 


X E 599 60-1 61 617 629 632 634 6l8 ll 
s.D 13 ll 11 08 09 12 15 
X Sa 608 625 C9 605 60 613 614 61-1 10 
S.D 14 II 08 07 08 11 13 
X Sz 602 599 59-4 $88 S81 594 398 S94 ll 
S.D 14 I] 09 07 08 l 15 





A summary of the analysis of variance performed on these data is given in Table 4. The triple 
interaction implies that each of the four treatment conditions produced unique profiles across 
trials. If subjects are given acquaintances 1-6 on Trial 1, then the percentage of positive 
adjectives used decreases as a function of trials, but the rate of decrease is greater over the first 
few trials if the acquaintance deleted is that which received the greatest number of positive 
adjectives. If subjects are given acquaintances 1 a-6a on Trial 1, then the percentage of positive 
adjectives used increases over trials, but the rate of increase is greater over the first few trials if 
the acquaintance being deleted is that which received the fewest number of positive adjectives. 
The AxC and BxC interactions simply highlight these trends. The former emerges because 
subjects given acquaintances 1-6 on Trial 1 generally show decreasing scores as a function of 
trials, while those given acquaintances 1 a6a on Trial 1 generally show increasing scores as a 
function of trials. The BxC interaction is the result of the differing rates of change due to the 
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Table 4. Summary of analyses of variance of experimental and simulated (S, and Sg) data 


LE CL ECL tT 











Experiment Sy, Sp 

Source d.f. F F F 
A. Type of acquaintance 1, 36 <1 <1 <1 

on Trial 1 
B. Type of acquaintance 1, 36 17-66** 39-89** 60-60** 

deleted on each trial ` 
AxB 1, 36 1-87 n.s. 13-32** <l A 
C. Trials 6, 216 1-93 n.s. 1-53 n.s. 1-05 n.s. 
AxC 6, 216 51-18** 228-20** . | 225-21** 
BxC 6, 216 10-13** 10-20** 10-32** 
AXBxC 6, 216 3-05** 3-46** <1 
** P<0-01. 


type of acquaintance deleted on each trial. The main effect is due to an overall tendency to use 
fewer positive adjectives if the acquaintance deleted on each trial is that which received the 
greatest number of positive adjectives. None of these effects are surprising, and they combine to 
indicate that the experiment achieved what it was designed to do: sample subjects construals of 
a wide variety of sets of acquaintances. 

In order to determine the goodness of fit of these data with the proposed model of 
interpersonal judgement, the data were reanalysed as follows. Since 40 subjects completed the 
repertory grid task with six acquaintances on seven separate trials, the pooled data consist of 
1680 acquaintances, each construed on 12 dimensions. Of these, 1057 acquaintances, or 62-9 per 
cent had a majority of positive adjectives assigned to them, i.e. were typical acquaintances. The 
20 160 adjectives used to construe acquaintances were distributed as follows: (1) 9745, or 48-3 
per cent, were positive adjectives assigned to typical acquaintances; (2) 2929, or 14-5 per cent, 
were negative adjectives assigned to typical acquaintances; (3) 2708, or 13-4 per cent, were 
positive adjectives assigned to atypical acquaintances; and (4) 4778, or 23-7 per cent, were 
negative adjectives assigned to atypical acquaintances. These percentages are reported as the. 
observed values in Table 1. They are all within 1-5 per cent of the predicted values. The 
proposed model of interpersonal judgement is consistently confirmed by these data. 


The self 


The pooled results yield data on 280 self figures, each construed on 12 dimensions. Of these, 
208, or 74:3 per cent, had a majority of positive adjectives assigned to them. Of the 3360 
adjectives used to construe the self, 2237, or 66-6 per cent, were positive. These results suggest 
that people are, on the whole, more likely to assign positive adjectives to themselves than they 
are to their acquaintances. However, Hauser & Shapiro (1973) have shown that the self-image 
can be quite labile, and easily influenced by the perspective from which the self-report is 
undertaken (e.g. ideal self, private self, etc.). It would be useful to have data comparing the 
percentage of positive adjectives assigned to the self across a wide range of self-perspectives, 
just as we have data-comparing the percentage of positive adjectives assigned to a wide range of 
acquaintances from the present study. Such information would help us determine more clearly 
whether or not people are typically more generous to themselves than they are to others when it 
comes to the assignment of positive characteristics. 
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A computer simulation of the Fibonacci Decision Rule 


Having presented evidence that positive and negative judgements of acquaintances are 
accurately predicted by the model given in Table 1, we can now present a computer simulation 
of the Fibonacci Decision Rule (FDR). As pointed out above, the simulation is designed to 
determine whether the FDR, if used by subjects to perform a repertory grid task, would yield 
classes of judgements in golden section relations to one another. If so, the FDR can stand as a 
theory of how subjects sort acquaintances in a task requiring either a positive or negative 
judgement to be made of each acquaintance. Since many repertory grid tasks have this format, 
such a theory is obviously worth pursuing. 

The program used to conduct the simulation was written in the language ‘Glue’ (Green, 1973; 
Green & Guest, 1974), which is intended to provide both a convenient programming language 
and a communication medium comprehensible to users of other languages. (Copies of the 
program are available from the second author.) Its basic operations may be described as follows 
as regards the portion that makes out the repertory grid: 

(1) Generate ranks for all the 12 acquaintances on all 12 dimensions. 

(2) Determine the average number of acquaintances to be judged positive on each dimension. 
(The FDR does this automatically. The required number is 7-4.) 

(3) Assign positive or negative labels to each acquaintance on each dimension. If the rank of 
an acquaintance on a dimension is less than 8, give it a positive label; if greater than 8, give it a 
negative label; if exactly 8, give it a positive label with probability 0-4, otherwise give it a 
negative label. 

The experimental procedure previously described was carried out using the program to mimic 
each subject. For the two groups which started with acquaintances 1-6 on Trial 1, the 
acquantances used in the simulation were those ranked 1-6 on the first dimension. For the two 
groups which started with acquaintances 1 a-6a on Trial 1, the acquaintances used in the 
simulation were those ranked 7-12 on the first dimension. For each ‘subject’ there were seven 
trials. After each trial, whichever acquaintance from the starting set received the most (or least) 
positive labels was replaced with an acquaintance from the other set. 

Two sets of simulated results are reported in Tables 1, 3 and 4. The first set, labelled S,, 
represents data from a population of simulated ideal subjects, for whom all the dimensions are 
orthogonal and therefore have intercorrelations of zero. This simulation is of theoretical interest 
since a state in which all dimensions are orthogonal corresponds to one of the definitions of 
maximal ‘cognitive complexity’ (Bieri, 1955; Bannister & Mair, 1968, p. 184). It is often argued 
that such a multidimensional construct system is optimal because it allows the person to make 
more differentiated, precise predictions concerning the behaviour of acquaintances. For reasons 
given on page 26, the predicted values of Table 1 may correspond to an optimal state as well. 
Thus, if S, generates an outcome consistent with those predicted values, then we are justified in 
believing that our model of interpersonal judgement is consistent with at least one definition of 
an optimal information-processing strategy, i.e. ‘cognitive complexity’. In fact, inspection of 
Table 1 reveals that S, generates values within 1-5 per cent of the predicted ones. 

The second simulation, Ss, is of subjects who are somewhat less ‘ideal’ than those simulated 
in Sa, and represents data from a population of simulated human (rather than ideal) subjects. 
Consistent with Warr & Haycock’s (1970) finding that the three semantic differential factors 
emerge reliably for dimensions of interpersonal judgement, the dimensions used in the simulation 
also form three groups. The intercorrelation between these three groups was set at approximately 
zero. The intercorrelation of dimensions within each group was arbitrarily set at approximately 
0-50, so that dimensions within each group are significantly intercorrelated. In addition, this 
simulation utilizes a ‘flattening process’ whereby the number of positive labels assigned to each 
acquaintance by each subject is adjusted toward a mean of 6 by subtracting 20 per cent of its 
deviation from that value. This flattening process, or one very like it, is probably used by real 
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subjects as well, since inspection of the experimental data shows that very few acquantances are 
assigned all positive, or all negative, labels. Thus, building such a process into the program is an 
attempt to specify another possible mechanism underlying repertory grid performance. Table 1 
shows that Sz fits the model quite well, all values being within 3 per cent of the predicted ones. 

Inspection of Table 3 shows that both simulations lead to approximately the same trial effects 
as found in the experimental data, and that the marginal means and s.D.s fit well, in some cases 
remarkably well. The analyses of variance performed on the simulated data, given in Table 4, 
indicate that each of the simulations has its own virtue. The three largest F ratios produced by 
the analysis of the experimental data are also highly significant F ratios in both analyses of the 
simulated data. However, the analysis based on S4 yields one significant interaction too many, 
and the analysis based on Sp yields one significant interaction too few. Thus, while it is obvious 
that neither simulation is perfect, it is equally obvious that both simulations capture the major 
features of the experimental data. Taken together, the two simulations are encouraging for the 
hypothesis that subjects tend to use the FDR to assign positive and negative labels to 
acquaintances in order to achieve maximal contrast between positive and negative events. 

Given that the simulations were relatively successful, it is important to spell out their logical 
status. Both simulations were constrained to give the result that approximately 62 per cent of all 
adjectives assigned were positive, but, as pointed out earlier, they were not constrained in any 
obvious way to yield the other predicted values in Table 1. The fact that the simulations give 
good approximations to those predicted values implies that the use of the FDR will necessarily 
result in repertory grids which possess all of properties 1-5 given on pages 27-28. It follows that 
this conclusion could have been reached using purely analytical techniques, but computer 
simulation techniques were considerably easier to employ. The latter have shown that there is a 
logically coherent relationship between the FDR and the model presented in Table 1. Thus, 
future research can reasonably be directed toward the further exploration of the hypothesis that 
subjects tend to use the FDR in repertory grid tasks. 


Some suggestions for future research 


The distinction between positive and negative adjectives, which is central to the present study, 
has developed out of research with the semantic differential (Osgood & Richards, 1973). 
Consequently, we have tested some of the implications of the golden section hypothesis by 
supplying subjects with dimensions for which the positive and negative poles are known. In 
principle, the golden section hypothesis should also apply across a broad range of situations 
requiring either positive or negative judgements, in addition to those utilizing a repertory grid 
format. A concrete example of one possible application is to situations like that studied by Eiser 
& Eiser (1975). They examined the degree to which subjects view future events as both probable 
and desirable. A useful experiment would be to have subjects divide a set of possible future 
events into those which are probable (vs. improbable) and also into those which are desirable 
(vs. undesirable). Our model (Table 1) implies that events which are both desirable and 

probable should constitute 47-2 per cent of the total; those which are undesirable and improbable 
23-6 per cent; while those which are either desirable but improbable or undesirable but probable 
should each occur 14-6 per cent of the time. Should subjects organize their judgements in this 
fashion, it would lend further support to the notion that the golden section is preferred because 
it enables subjects to highlight classes of atypical events against a background of more typical 
events. Studies like the one just outlined would help determine the generality of the golden 
section hypothesis. 

Most research on the golden section has been concerned with whether or not it is, in fact, a 
preferred proportion (Zusne, 1970). Few studies have explored cross-cultural and developmental 
determinants of golden section preferences. Given the growing evidence for the psychological 
reality of the golden section, such studies are increasingly appropriate, and should shed light on 
how preferences for golden section relations arise and are maintained in everyday life. 
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Constructive properties of theoretical models 
Stephen A. Sharp 





This paper isolates various properties of theoretical models which are conducive to the successful fulfilment 
of the functions of theory in the progress of scientific research. These properties are summarized in the 
concept of value which refers to the statistical accuracy of a given model along with its complexity which is 
defined in terms of the testable axioms of the model. These issues are then discussed in the context of 
specific examples from the psychological research literature. 





Concepts 

Lists of the functions which can be or should be fulfilled by models in social sciences and details 
of those features of models which are important in ensuring that these functions are properly 
fulfilled have been presented by several past writers (e.g. Apostel, 1960; Braithwaite, 1962; 
Lachman, 1963; Chapanis, 1963, etc.). It is not the purpose of this paper to duplicate these 
comments or to amplify them but instead to examine some more specific characteristics of 
models in social science which are relevant to whether or not the model will be successful in 
helping to further understanding of the behaviour in question. Unlike most other papers, the 
points raised are further discussed in the context of particular examples which enable the 
significance of various aspects of the models to be more easily appreciated. 

We begin with a consideration of some major concepts underlying the role of models in 
research. The first of these will be the concepts of the default axiom and default prediction. In 
order to facilitate the discussion of these ideas, it is useful to observe first that from an 
experimental point of view, a theoretical model may be regarded as a mechanism for relating 
the independent variables of the experiment to the dependent variables. For instance, in a 
paired-associate learning paradigm, a model may be required to relate an independent variable 
such as trial number to a dependent variable such as the probability of a correct response. Now 
it is usually the case that experiments may be extended either by introducing new independent 
variables (such as item difficulty) or new dependent variables (such as response latency). 

Often in such cases it will be found that the model for the original experiment was not stated in 
sufficient detail or did not have sufficient scope to allow the new variables to be related fully to 
each other. The original experiment may for instance have been to test for incremental as 
opposed to all-or-none learning, and in the statement of the models, emphasis would be laid on 
those axioms which differentiate the rival models in this respect. Thus it may well be that neither 
model contains an axiom about subject homogeneity although in any experiment involving more 
than one subject, some assumption about subject differences must be made before the 
predictions from the models may be generated. It is therefore necessary to ‘assume’ an axiom 
into the model for the purposes of evaluation. Such axioms will be called default axioms as they 
are often not stated as part of the model, 

Default axioms usually refer to some independent variable not being a factor in determining 
the results and, in the case at present under discussion, are obligatory in that those independent 
variables which exist in the experiment but are not systematically studied (subject age, item 
difficulty, etc.) must be in some way related to the dependent variable(s). The predictions 
resulting from this process will be termed default predictions, which usually follow in a fairly 
straightforward manner from default axioms. The relationship between these axioms and 
predictions is amplified by example below. 

The obligatory nature of default axioms applies when additional independent variables are 
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being related to the original dependent variables. The position differs when additional dependent 
variables are considered. Here, such as when confidence ratings are introduced into a learning 
paradigm, the original model may not yield any predictions in terms of the new observables. In 
such a case, it may be extended by means of adding new axioms or incorporating optional 
default axioms or it may be left with its original, smaller scope of predictions in which case, 
while it remains valid within its original terms, it becomes necessary to recognize that there is 
some aspect of the behaviour which is being missed by the model. 

Examples will be presented below of these various types of default axiom and prediction, and 
hopefully they should clarify the role played by these parts of theoretical models in 
psychological research. 

The next idea is that of accuracy which is the extent tc which a model provides a good fit to 
the data and may be assessed by methods ranging from visual inspection to chi-square tests. The 
technical aspects of accuracy are discussed by Sternberg (1963) but other research aspects are 
also significant. It is important to realize that there is a subjective element in assessing the 
relative importance of different types of discrepancy between the predictions from a theory and 
the observations from an experiment. In this context Rescher & Helmer (1959) have advocated 
the use of professional expertise in evaluating the ability of the model to handle various aspects 
of the data and in deciding tolerable limits. 

With intractable models, it is sometimes the case that visual inspection is the only available 
method of comparing the accuracies of the two models. it may then be important to study the 
performance of human beings as information extractors and processers in these circumstances. 
This is especially important when researchers evaluate their own models in this way since the 
experimenter bias becomes more likely. It seems much more advisable simply to present the 
graphs and tables in publications, etc., and to allow other researchers to evaluate them for 
themselves. 

Technical aspects of the evaluation of statistical accuracy have been extensively discussed in 
the literature of statistics and should properly be included in such a paper as the present one. 
However, the topic is sufficiently large and complex to make it amenable to discussion only 
where more space is available (cf. Sharp, 1976). 

The empirical content of a model refers to the range and detail of the predictions which can be 
derived from it, or, in other words, the total number of relationships between dependent and 
independent variables which are implied by the model. Often, empirical content will be found to 
be equally great for different models when the effect of default axioms and predictions has been 
allowed for. If one model provides predictions about, say, the distribution of response times, it 
will be possible to derive similar predictions from other models, even if they are predictions of 
constancy resulting from an optional default axiom. 

Empirical content includes default predictions which in statistical terms, often represent 
strong, easily testable null hypotheses, and thus they increase the falsifiability of the model 
(Popper, 1968). The familiarity of this concept makes it unnecessary to say much about it here. 
But it is worth noting that falsifiability is, for present purposes, most important in extreme cases 
where a model yielding no testable predictions at all is clearly of little value in furthering 
empirical research while, at the other extreme, a model yielding deterministic predictions may, if 
it is difficult to modify, lose much of its potential usefulness if it is proved false. A little 
flexibility may be useful in a model to take account of successively more detailed experimental 
results as research develops. 

A particularly important distinction with respect to empirical content is that between models 
whose predictions are stated in terms of experimental events such as individual responses and 
those whose predictions are less specific. The latter are clearly much less falsifiable since they 
usually predict only the existence of general ‘trends’ in the data, and the investigation of these 
‘trends’ usually involves summing over several experimental dimensions (such as subjects) and 
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plotting graphically or analysing statistically two or more samples with large error variances 
which make discrimination between models and the evaluation of individual models very 
difficult. 

It is often the case that such non-specific models embrace a range of statistically identical 
submodels between which experimental discrimination is impossible but which may well differ in 
ways which are very important as regards understanding the behaviour in question. For instance, 
the memory scanning models discussed by Sternberg (1966) are non-probabilistic in the above 
sense and each of the models embraces a variety of special cases which have differing response 
selection mechanisms. The distinguishability of psychologically significant details within the 
context of a general model is an important but complex topic which merits discussion but for 
which there is insufficient space here. 

Complexity refers to the number and details of the testable axioms (excluding optional default 
axioms) of a model. In psychological terms, it refers to the number of ‘mechanisms’ such as 
memory stores or information counters postulated and to the detail with which each must be 
described to enable the model to be applied in research. 

Also relevant is the ease of derivation of the predictions (i.e. tractability) as well as the 
specificity of the predictions, which relates complexity to empirical content. For logically related 
models, complexity may often be related to the number of parameters if the model is sufficiently 
quantified, but this is not necessarily so for completely distinct models. This in fact leads to the 
possibility of having several criteria for complexity such as axiomatic and parametric, but 
comments relevant to this point by Gregg & Simon (1967) and below suggest that the axiomatic 
criterion is the most useful one for social sciences. 

The concept of complexity is by nature difficult to discuss in abstract terms and is more 
appropriately left to the examination of specific examples later. Consequently, we proceed to the 
final notion to be discussed here which is that of the value of a model. This is, in a sense, the 
culmination of the six notions discussed above. The value of a model is the extent to which it is 
useful in research terms in furthering understanding of the behavioural processes in question. A 
good conceptual index of the value of a model is the ratio (other things being equal) between its 
accuracy and its complexity. For a given level of complexity, a more accurate model is clearly 
to be preferred while of two equally accurate models, the simpler is again clearly likely to be the 
more useful in research. 

The definition of complexity clearly affects the definition of value and it is for this reason that 
the axiomatic view of complexity is to be preferred to the parametric. Specifically, the 
parameter-based view of complexity would confuse the ideas of complexity and empirical 
content because, as it was outlined above, the concept of complexity refers primarily to axioms 
rather than predictions while issues such as parameter estimation are more relevant to the 
empirical content of the model. The examples given below also support the axiomatic view of 
complexity. Gregg & Simon (1967) suggest the number of words necessary to state the axioms as 
a measure of complexity, but for present purposes a numerical measure is not so important as an 
intuitive recognition that complexity can vary from one model to another and that it is important 
in determining a model’s value. 

It should also be stated that complexity as a determinant of value does not involve optional 
default axioms. This means that a strong, simple model may still be exploited if it accounts fairly 
well for major aspects of the data. However, default predictions from obligatory default axioms 
must be counted in assessing accuracy since they may form a significant part of the predictions 
of a simple model. 

It should be clear from the above discussion that the ideas are related to each other and are 
brought together in the overall concept of value. Specifically, it integrates accuracy (and thus 
default predictions and empirical content) with complexity (and thus the two types of default 
axiom). The role played by each of these ideas is brought out more fully below but the preceding 
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comments are adequate to provide an outline.of the conceptual framework to be illustrated 
below. Ie 
Finally, it should be noted that the above list is not necessarily complete. Many other issues 
could have been raised in connection with model usage and evaluation, but it is argued that the 
list contains those concepts of principle relevance and usefulness and that any other necessary 
matters have been discussed in the literature of the philosophy of science (Suppes, 1960; Simon 
& Newell, 1963; Marx, 1963, and others). 

We now proceed to examine two selected examples of models proposed in the psychological 
literature. These examples illustrate particularly well the importance of the matters raised above, 
and the ways in which they determine the potential usefulness of models in furthering research. 


Examples 

The first example involves three models for concept formation proposed by Restle (1962), 
Falmagne (1970) and Wickens & Millward (1971). In the interests of space, these models will be 
described only briefly; details may be gained from the references. 

The Restle model states that at any time, the subject holds one hypothesis which he tests until 
it is falsified at which point he restarts from scratch. Hypothesis sampling probabilities are 
stationary and independent and the subject is assumed to have no memory. After solution, the 
process is static and deterministic. 

The Falmagne model generalizes this by introducing ‘weights’ for hypotheses which determine 
the sampling probabilities, increasing following verification and decreasing following falsification. 
Resampling may occur at any time, or no hypothesis at all may be held. 

Wickens & Millward generalize the Restle model by postulating that the subject tests S 
hypotheses in parallel until he finds the correct one or rejects them all, in which case he 
transfers the whole cluster to a reject buffer and samples another S hypotheses. The buffer can 
hold L clusters beyond which they are returned to the original pool. 

The Falmagne model allows the possibility of no hypothesis on a trial but not the possibility of 
several while the reverse is true of the Wickens—Millward model. Also the Falmagne model 
postulates no memory other than reflected in the weights of the hypotheses while the 
Wickens—Millward model specifically postulates a memory but refers only to the ‘saliency’ of 
hypotheses in determining the sampling order. 

The sampling axioms of the Falmagne model are the more detailed as they specify sampling 
probabilities which the Wickens—Millward model does not do, and its response axioms also 
differ: the Wickens—Millward model assumes that responses are made at random until only the 
one hypothesis is left in the sample, or that one hypothesis is selected and responses made 
according to it until it is rejected. 

One other point needs to be made before the models can be discussed: the two 
generalizations of the Restle model were made to allow for its inadequacies. Erikson, Zajkowski 
& Ehmann (1966) found that response latencies fall rapidly immediately following the final error. 
The Falmagne model explains this by postulating that the weight of the correct hypothesis tends 
to increase over trials while the weights of all others tend to vanish, although the model provides 
no mechanism for the changing of these weights. The Wickens—Millward model accounts for the 
Erikson finding on the ground that there is no time spent in testing several hypotheses against 
the experimental outcome of the trial since all hypotheses save one have been rejected. 

We may now discuss the issues mentioned above in the context of this example. The Restle 
model was originally proposed for an experimental paradigm whose independent variable was 
trial number and whose dependent variable was the probability of a correct response. Erikson’s 
experiment involved the introduction of a second dependent variable which brought to light 
further empirical findings. Restle’s original model provides no explanation for these due to its 
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limited scope. If it is to be made to provide a prediction for response latency without being 
modified as Falmagne or Wickens & Millward did, that prediction would be one of constant 
latency because the static nature of the Markov process after absorption provides no reason why 
latencies should vary. 

This then is an example of an optional default axiom and prediction of constant reaction time 
which, if accepted, considerably reduces the model’s accuracy, which is otherwise good. In such 
cases the distinction between accepting and rejecting such an optional default axiom is the 
distinction between having a wrong explanation and having no explanation at all. Either way 
there is some evidence that the model should be investigated further with a view to making it 
more adequate as an account of the observed responses. 

It is relevant to consider Suppes’ (1960) comment that models in the social sciences are 
designed primarily to account for the major aspects of the data and if the Restle model does 
this, it may still have a useful role to play among models for concept identification. 

Apart from this, however, we can see that the two more general models are straightforward 
examples of modifications to improve data accuracy. Clearly, then, accuracy and complexity 
increases as default axioms are replaced, thus reducing falsifiability. 

The Falmagne model is the more complex as is clear from a comparison of the axioms of the 
two models as laid out in the Falmagne and Wickens—Millward papers. It is considerably less 
tractable since Falmagne is able to derive only some non-parametric predictions from the model, 
even this being a long, involved process. So much so that in this case, complexity becomes 
involved with empirical content in that the model’s intractability makes many predictions (which 
presumably exist) impossible to explore. The model thus becomes so much less falsifiable that its 
empirical adequacy is difficult to assess, which, at this extreme, becomes a serious defect in the 
potential usefulness of the model. 

The above argument has been leading up to one overall conclusion: that the Wickens—Millward 
model is more valuable than the Falmagne model. While the former has the advantages of 
plausibility, simplicity and tractability, the latter has features which make it less attractive in 
research terms — it is difficult to handle, somewhat ad hoc in nature and requires many axiomatic 
details to enable it to improve the accuracy of the Restle model. The difference lies in the nature 
of the increased complexity of the generalised models; the Falmagne approach is to build a 
statistical ‘machine’ which generates data more similar to observed data than can the Restle 
model while Wickens & Millward ask how subjects solve the experimental tasks, if it is not in 
the way proposed by the Restle model. 

This example has provided an instance of how different ways of generalizing a simple model 
may show different degrees of value and has illustrated some of the properties of models which 
are helpful in achieving a high accuracy/complexity ratio. 

We next consider a paper by Gregg & Simon (1967) on the comparison of statistical models 
for concept identification such as that proposed by Trabasso & Bower (1964), and 
computer-oriented information-processing models. They note that the ‘credibility’ of models may 
be assessed in several ways such as the accuracy of predictions, the ease and precision with 
which they are made and the range of dependent and independent variables which they cover. In 
this sense, the stastistical models have an advantage since they may use the theory of logic and 
probability in deriving predictions. 

The other approach to credibility is based on the plausibility of the model in terms of what 
may be intuitively expected and of what facts are already known, i.e. the sort of model which 
has proved useful in other, similar experiments. This view may be formalized by Bayes’ theorem 
and a discussion of this is provided in Gregg & Simon’s paper. 

Statistical and information-processing models will tend to cover the same ranges of dependent 
and independent variables since we are talking about alternative models rather than 
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generalizations. Therefore the concepts of default axioms and predictions are not as important as 
in the previous example. Similarly, as both models yield. predictions in terms of the probabilities 
of correct responses and are roughly equally tractable, they have comparable empirical contents. 

But there are other ways in which the models differ markedly. Most of these differences stem 
from the fact that the statistical model involves the use of parameter estimation from the data 
which, if carried out by some procedure such as maximum likelihood, guarantees a certain level 
of agreement between predictions and observations. The information processing model, on the 
other hand, makes much less use of parameter estimation — in fact, the case discussed by Gregg 
& Simon has no parameter space at all since the axioms of the model imply values a priori for 
all the parameters of the statistical model. 

This implies that the statistical model is less falsifiable (since it may still agree tairly well with 
the data even if its axioms are far from the truth) and more accurate (because parameter 
estimates are made from the data). It is also less complex; for example its axioms, in contrast to 
those of the information-processing model, are not stated in sufficient detail to impute parameter 
values for the model. But its greater accuracy and lesser complexity does not necessarily give it 
greater value than the processing model. It was mentioned in the first section of this paper that 
value may be taken as the ratio of accuracy and complexity, other things being equal. In this 
case, there is an important factor (falsifiability) on which the statistical model is at a 
disadvantage. 

We may conclude, then, that the two models are useful in different circumstances; statistical 
models are more suited to the study of the stochastic structure of experimental data while 
information-processing models may still fulfil an important role in the development of computer 
algorithms for concept identification. This illustrates how the value of a model must be 
considered not only with respect to the properties of the model and how they relate to its 
general potential in research but also with respect to the context in which it will be used. 

The discussion of these two examples has raised a variety of points relevant to effective model 
usage and it will be useful to summarize them and their umplications for the best exploitation of 
models in theoretical research. 

Default axioms are useful in distinguishing simple from complex models and in clarifying the 
ways in which complexity varies from one model to another. Default predictions are useful in 
clarifying how the simplicity of the model affects the strength of the predictions it makes. 

Accuracy suggests how likely it is that the model represents a ‘true’ account (in Suppes’, 

1960, sense) of reality while falsifiability and empirical content play a role in determining how 
accuracy is to be assessed. 

Complexity clarifies how much theoretical machinery is necessary for the model to be an 
account of the phenomenon under study and, finally, value is the extent to which the model is 
successful in performing the functions required of it. These are discussed by the authors cited in 
the opening paragraph of this paper and this discussion will not be duplicated here. 

Finally, it seems likely that the ever-increasing volume of model usage in the literature of 
experimental social science may render desirable a greater awareness of the concepts underlying 
the part played by models in empirical research and, specifically, of how they interact with 
different types of models used in different fields of enquiry. The present paper has suggested 
some lines along which such an argument may be referred to individual models and paradigms, 
as well as representing general features of the effective exploitation of models in empirical 
research in the social sciences. 
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Cross-cultural study of factors influencing orientation errors in the 
reproduction of Kohs-type figures 


Gustav Jahoda 


The aim of the study was to examine pattern difficulty as a general factor influencing orientation errors, and 
to explore psychological differentiation and task perception as variables accounting for cross-cultural 
differences. Samples of 30 boys and 30 girls in Ghana and Scotland were tested with a specially devised 
apparatus under two treatment conditions. Results confirmed the effect of pattern difficulty, but the extent of 
psychological differentiation was only indirectly related to orientation errors. Variations in the manner 
subjects perceived the nature of the task appeared as a major determinant of cross-cultural differences. 


The general background of this research area is discussed in an earlier report on a study 
conducted with Ghanaian children (Jahoda, 1976). In that study two main hypotheses were 
tested. The first was that where the internal structure of the pattern to be copied was relatively 
complex and thus more difficult (operationally defined in terms of completion time), subjects 
would depart more from the correct orientation; this expectation was confirmed. The second 
hypothesis focused on the determinants of gross rotation errors, which had previously been 
investigated by Deregowski (1972) and Serpell (1971 a, b). While their findings substantially 
advanced the understanding of the processes involved, they could not account for the prevalence 
of striking cross-cultural differences. The notion explored in the earlier study was that the degree 
of development of an adequate geometrical reference system would influence the extent of gross 
rotation errors. The concept of horizontality was used as an indicator of development in this 
sphere. Results showed that the relative mastery of the concept did have an effect, but it 
appeared to mainly be an indirect one: horizontality was significantly related to the probability of 
subjects making orientation adjustments on completing the patterns, and those who made 
adjustments had significantly fewer rotation errors. Further analysis of the general patterns of 
interrelationships among the findings suggested that an important underlying variable might be 
psychological differentiation, with field-dependent subjects making more rotation errors. This 
would make sense in terms of cross-cultural findings, since African subjects have generally been 
found to make more gross rotation errors (Deregowski, 1972) and to be more field-dependent 
(Witkin & Berry, 1975). 

The objective of the present study was to replicate the effect of pattern complexity and to test 
the field-dependence hypothesis with both Ghanaian and Scottish samples. One additional 
problem was explored, which requires more detailed explanation. In a past study which was only 
peripherally concerned with orientation, the present writer (Jahoda, 1956) attributed rotation 
errors to lack of attention to orientation, which was disproved by later work demonstrating that 
such errors were non-random. However, on the basis of observed behaviour the further 
suggestion was made in that article that ‘the subjects on the whole did not regard the 
figure-ground relationship as part of their task’; and this was supported by the fact that in a 
multiple choice situation errorless pattern reproductions displayed in differing orientations tended 
to be regarded by the subjects as being all the ‘same’ as the model. Subsequent work with 
African subjects by the present writer confirmed the subjective impression that the expression 
‘the same’ does not necessarily have identical referents for experimenter and subjects. 
Developmental aspects of ‘same’ and ‘different’ judgements have been extensively studied in 
recent years (Vurpillot & Moal, 1970; Vurpillot & Taranne, 1974; Blake & Beilin, 1975). The 
findings indicate that younger children have a less strict criterion for ‘same’ than older ones 
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and/or employ different decision rules in their judgements. In the context of visual scanning Day 
(1975) pursued a similar issue, concluding that ‘the subject’s search was, in most cases, directed 
by his conception of the task...’ (p. 167); and it was pointed out that the difference between 
child and adult scanning patterns may be rooted in the kind of question each is seeking to 
answer. All the studies indicate that there is a process whereby conceptions of the task and 
criteria of relevance become gradually modified, and these changes are presumably connected 
with the nature of the learning environment. It is suggested here that in a less ‘carpentered’ 
environment orientation is likely to be a less salient feature, there being relatively far fewer 
occasions in the child’s life when ‘wrong’ orientation of objects in space is being corrected. 
Hence one important element in the differences in the performance of African and European 
children on a reproduction task may lie in the extent to which they regard orientation as a 
critical attribute of ‘sameness’. 

The original plan of the study envisaged a rather elaborate overlapping design whereby part of 
the testing would be carried out with materials previously employed and part with new methods. 
Unfortunately practical difficulties arose in the field (e.g. heavy rains preventing access to the 
village) which forced a radical cutting down of the original design, and even then, some of the 
data remained incomplete. In spite of these obstacles sufficient information was collected to 
throw a considerable amount of fresh light on the problem of orientation. 


Method 
Apparatus and materials 


Reproduction task. The device used in the previous study (Jahoda, 1976) suffered from the drawback that 
orientation errors had to be measured with a protractor. In order to obviate this, the apparatus shown in Fig. 
1 was constructed. The centre bounded by the circular line is attached to a larger circular segment recessed 
into the inside, which rests on ball-bearings. It can therefore be readily moved by slight finger pressure, yet 
there is sufficient friction to prevent it from running freely when the pressure is removed. The pattern is 
constructed within the hollow square, into which the plastic tiles fit. On the edge of the large circular 
segment concealed within the body of the apparatus angles are marked at 5° intervals from 0° to 360°; more 
precise readings are arrived at by interpolation. A small window at the side of the apparatus permits the 
experimenter to take inconspicuous readings. In fact none of the Ghanaian subjects and only a very few of 
the Scottish ones appeared to have been aware of the measurements. 

The set of patterns to be reproduced is shown in Fig. 2. The demonstration items featured both instability 
and asymmetry in order to increase the probability of orientation errors, which could then be corrected in 
the course of training. On the other hand all the test stimuli consisted of symmetric patterns so as to 
eliminate asymmetry as a source of variance. 

The size of each model card was 120 mm square, and the small shapes forming the patterns were 30 mm 
square. The response material consisted of the above-mentioned rlastic tiles of the same dimension as the 
models, i.e. 30 mm square. They were either red, blue, or divided diagonally into red and blue halves. For all 
trials four tiles were needed, and the appropriate ones were placed on the table in front of the subject before 
each tral. 


Sameness/ difference judgements. In this task subjects were presented with pairs of patterns and had to say 
whether they were ‘the same’ or ‘different’. For the reasons explained a shortened series had to be used, 
which is reproduced in Fig. 3 The pattern on the left always consisted of a standard model card, while that 
on the right was constructed by the experimenter with the plastic tiles. Both were in a deep box so as to 
remain invisible to the subject until he stood right above the box. It will be noted that non-identical pairs 
differed either in pattern, or in orientation, or in both; orientation differences ranged from 45° to 180°. Gross 
pattern differences were of course included merely for the purpose of masking the intent of the exercise and 
serving as a check on the understanding of instructions. In fact no subject had to be eliminated. 


Psychological differentiation. The children’s Embedded Figures Test (CEFT) was used for assessing this 
(Witkin, Oltman, Raskin & Karp, 1971). In view of their lack of femiliarity with this kind of tesk, the mode 
of administration designed for younger children was routinely applied with Afncan subjects. 
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Figure 1. Apparatus for reproduction task. 
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Figure 2. Stimulus patterns. The two demonstration items are on top; the rows show patterns for trials 1-4, 
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Figure 3. Stimuli for sameness/difference judgements. The two columns on the left show pairs 1-5, those on 
the right pairs 6-10. 








Subjects 


These were 60 school children in both Ghana and Scotland, half of them boys and half girls in each sample. 
The Ghanaian school was in a village in the Accra region situated in a farming area, but within reach of a 
dormitory settlement for people working in Accra. This is reflected in the occupational distribution of the 
fathers: roughly one-third were farmers, about half labourers or semi-skilled workers, and the remainder 
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artisans or clerical workers. After eliminating some exceptionally old subjects (i.e. over 16) the sample was 
randomly drawn from the first and second classes of the Middle School, constituting the seventh and eighth 
year of schooling respectively. The ages, which are approximate, ranged from 12 to 16 with a mode of 15 
years for both sexes. 

The Scottish children were obtained from the Primary 7 classes of a school in the Glasgow region, whose 
catchment area is chiefly semi- and unskilled working class. All these children were about 11 years old, and 
thus about four years younger than the Ghanaians. The choice of school and age group was of course 
deliberate, since the aim was to analyse processes rather than make comparisons; hence it was desirable that 
the level of performance in the two cultures should not be excessively discrepant. 


Procedures 


At the outset all subjects were given the CEFT. Then the apparatus was positioned as shown in Fig. 1. The 
experimenter began by demonstrating the reproduction of the model pattern, emphasizing that it must be the 
same in every way. The subject then performed the task, being corrected if necessary. This was followed by 
a second training trial which the subject attempted on his own, being helped if needed. Once again the 
orientation was stressed, and if it was faulty (as often happened) the experimenter would carefully point this 
out and proceed to a correction. The subject was also instructed to tell the experimenter when he or she had 
completed a pattern. 

Before the start of the training in the reproduction task, subjects had been randomly assigned to two 
different treatments: angle pre-set (AP) versus angle free (AF). For AP the response board was adjusted to a 
position half-way between horizontal and 45°; depending on whether the code number was odd or even, the 
first adjustment was either 2214° or 33744’, i.e. either clockwise or counterclockwise from 0°. This was done 
prior to each trial, including training. In the case of the AF treatment, the first training trial began with a 
setting of 0° corresponding to the orientation of the model. For the second training trial this was changed to 
the required 45°, but thereafter the experimenter did not touch the apparatus so that any rotation had to be 
carried out by the subject. In every other respect the treatments were alike 

The two different treatments were introduced in the hope of throwing further light on the determinants of 
subjects’ behaviour in this situation. It should be noted that the chance base is the same in both cases; i.e. if 
a subject completely ignores orientation, the expected mean angular deviation score 1s 224°. It is perhaps 
not immediately obvious why this should be so in the AF condition, and the reason is as follows: the setting 
at the start of the first trial was 45°, and if this remains constant the angular deviation will be 0° for half the 
trials with a diagonal presentation and 45° for the other half with a square presentation, yielding a mean of 
222°. All this of course further assumes the absence of any accidental shifts in the course of construction, 
which frequently occur in practice and unavoidably blur the picture. Leaving this out of account, it was 
expected that under the AP condition the constant reminder of the importance of orientation by the 
experimenter’s angle-setting before each trial would lead to a lesser mean deviation as compared with the AF 
condition. In the event it turned out that this prediction greatly underestimated the complexity of the factors 
involved in this apparently simple task. 

When the actual trials began, the experimenter recorded three measures: (1) the angle; (2) time in seconds 
elapsed between the start of a trial and reported completion; and (3) any adjustments made before the end of 
the trial. These ‘final adjustments’ were defined in terms of the subject’s placing his hand on the central part 
of the apparatus and producing some rotation — however slight, provided it was visible. Such a move may be 
taken as an indication of concern with orientation, however imperfect the outcome. 

The sameness/difference judgements had to be done on a separate occasion, several days after the first 
testing was completed. It was explained to the subjects that they would be shown pairs of patterns and 
would have to decide in each case whether they were the same, or different. While the subject was seated on 
a chair 6 m from the experimenter, the pairs of stimuli were prepared at the bottom of the box. The subject 
was then called and stood in front of the box which was on a chair, looking down into it. Some subjects who 
called the first pair ‘different’ because of slight differences in hue or similarly irrelevant features were told 
to ignore such details, and this was always readily understood. For the purpose of scoring the first trial was 
therefore omitted. 


Results 


The effects of the three major variables studied will be considered first, followed by a more 
detailed analysis of behaviour under the two presentation conditions AF and AP. Any departure 
from the correct orientation will be called an ‘angular deviation’, but Serpell’s (1971 a, b) 
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distinction between relatively minor deviations (‘disorientations’) and gross ones (‘rotations’) has 
been adopted here. The upper limit of disorientations has been drawn at 11°, i.e. just under half 
the setting in the AP condition. While the mode for pattern errors was zero in both cultures, 
mean rates were 1-95 in Ghana and 0-95 in Scotland; this difference is highly significant 

(P< 0-001 by Kolmogorov-Smirnov test). Unless stated otherwise, trials resulting in pattern 
errors were excluded from the analysis. 


Effects of pattern difficulty 


This variable is operationally defined in terms of time taken for completion. In Table 1 mean 
angular deviations for the three shortest trials are compared with those for the three longest 


Table 1. Mean angular deviations according to completion time (3 shortest versus 3 longest trials) 
Mean deviations 


Scotland Ghana 


Angle free 
Fastest mean times 4-2° 20-9° 
P<0-05 n.s. 
Slowest mean times 10-9° 207° 
Angle pre-set 
Fastest mean times 4.4° 11-4° 
n.8. P<0-01 
Slowest mean times 7-3° 22-8° 
Overall P< 0-01 





ones, and the sign test was applied. Scottish subjects’ disorientations were about twice as large 
with the more difficult as compared with the easier patterns; and while under the AP condition 
the difference fell just short of significance, it seemed justifiable to pool the data across 
presentation modes which showed the overall difference to be highly reliable. The same was true 
for Ghanaian subjects under AP but not AF conditions; in the latter case difficulty level had no 
discernible effect at all, so that pooling across conditions would have been inappropriate. The 
probable reasons for this lack of any difficulty effect will be discussed later in the general 
context of Ghanaian responses under AF conditions. 


Effects of psychological differentiation 


Table 2 gives the distribution of both mean angular deviation and CEFT scores. It is 
immediately obvious from scanning this table that field-dependence cannot be a direct determinant 
of rotation error. In spite of random allocation it so happened that the CEFT scores of Scottish 
and Ghanaian girls in the AF condition turned out to be, respectively, rather low and high in 
relation to their group and thus almost equal in absolute magnitude; yet the mean deviation score 
of the Ghanaian girls was almost twice that of the Scots. The pattern of correlations, similar in 
both cultures, points to a very modest relationship in the 2xpected direction; but all fell short of 
significance. 

An analysis of variance of angular deviation scores was carried out (2 culturesx2 sexes x2 
modes of presentation); apart from a marginally significant and uninterpretable triple interaction, 
only the cultural difference was massively significant (F= 63-00, d.f. = 1, 112, P< 0-001). 

A similar analysis (2 culturesx2 sexes) of CEFT scores irdicated that not only the cultural 
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Table 2. Mean CEFT and angular deviation scores according to culture, sex and presentation 
mode; and correlations for both sexes combined 


Scotland Ghana 

Boys Girls Boys Giris 

Angle free 
CEFT 19-2 — 12-9 13-6 12-4 
Deviation 5-2? 9-6° 22-3° 18-3° 
Tenet bey —0-27 —0-32 

Angle pre-set 
CEFT 18-1 19-9 14-3 9-9 
Deviation 6:2° 6-8° 15-4° 17-7 
TCEFT-Dev —0-22 —0-19 


(F = 29-82, d.f. = 1, 116, P< 0-001) but also the sex difference was significant (F =7-45, d.f. =1, 
116, P< 0-01). 


Conceptions of ‘sameness’ 


The responses to the matching test, grouped into categories, are shown in Table 3. The striking 
feature of the Ghanaian responses is that they seem to constitute a perfect scale, in the sense 


Table 3. Judgements of ‘sameness’ and ‘difference’ ~ numbers of subjects giving certain types of 
responses 


Ghana Scotland 


Only one 45° rotation called ‘same’ 7 5 
Both 45° rotations called ‘same’ 8 1 
Both 45° and one or more of 90°, 135° and 180° called ‘same’ 9 — 
One or more of 90°, 135° and 180° rotations called ‘same’ — . 8 
Calling two identical ones ‘different’ 1 4 
No matching ‘errors’ 7 42 

32 60 


Grouping first three and next two: x? = 40-32; d.f. =2, P< 0-001. 


that no subject called a rotation of 90° or more ‘different’ unless he had characterized both 45° 
rotations as such. Moreover, only one Ghanaian subject made a genuine matching error. The 
term ‘errors’ is placed in inverted commas within the table, since the responses are entirely 
consistent and reflect a coherent picture of a conception of ‘sameness’ which for most subjects 
involved toleration of a 45° rotation. 

This is in contrast with the pattern of Scottish children’s responses, which appear more 
random in nature; coupled with the fact that their rate of calling identical patterns ‘different’ was 
higher, this suggests carelessness rather than a different conception of ‘sameness’. However, the 
salient aspect of their responses is of course their low level of matching ‘errors’ of any kind - 30 
per cent as compared with 78 per cent for Ghanaian children. 

The relationship between what for the sake of brevity will now simply be called matching 
errors, and angular deviations was examined. A one-way analysis of variance was carried out 


52 Gustav Jahoda 


with the Ghanaian data, grouped into 0 or 1 versus 2 versus 3 or 4 matching error subjects. The 
result was highly significant (F = 7-00, d.f. =2, 29, P< 0-005). A corresponding t test for the 
Scottish sample (nil versus some errors) was also significant (t = 2-54, d.f. = 58, P< 0-02). 

There is thus clear evidence of an association between matching and angular deviation scores, 
but this could be interpreted as indicating merely that subjects tend to be less sensitive to 
orientation in both types of tasks, without involving the concept of ‘sameness’. Such an 
interpretation is rendered less plausible by the orderly scale property of the responses, which 
strongly suggests that Ghanaian subjects had a consistent tolerance range within which they 
would accept two stimuli as being ‘the same’. It is the detection of such regularities which in the 
literature has led to inferences about children’s concepticns of ‘sameness’. 


CEFT and matching scores in combination 


Although it became evident that psychological differentiation has very little independent effect 
on deviations, it might have an indirect influence. In examining this possibility contingency 
tables for CEFT and matching error were constructed by dichotomizing both variables into 
‘high’ and ‘low’. Since the shapes of the distributions fo- the two samples were similar, it 
seemed justifiable to combine them, with the result set out in Table 4. There was a significant 
tendency for field-independent subjects to make fewer matching errors. 


Table 4. CEFT and matching errors: Scottish and Ghanaian samples combined 


CEFT 
Matching errors High Low 
Few 37 27 64 
Many 8 20 28 
45 47 


X =6-67, d.f. =1, P<0-01. 


The combined effect of psychological differentiation end matching errors on angular deviations 
scores was tested by analysis of variance with the Scottish sample (the Ghanaian subsample for 
whom matching errors were available being too small); zhe interaction fell short of significance. 
Nevertheless this was further explored by plotting the r2lationships for each culture, as shown in 
Table 5. A clear-cut trend, consistent within each culture emerged: for field-independent 


Table 5. Mean angular deviations by CEFT and matching errors for Scotland and Ghana 


CEFT 
Matching errors High Low 
Scotland 
Few 5-4° 67° 
Many 5 12-4° 
Ghana 
Few 11-0° 13-5° 


Many 13-9° 23-2° 
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subjects, the extent of matching errors does not appear to make much difference to angular 
deviations; however, in the case of relatively field-dependent subjects their performance is 
almost the same as that of field-independent ones when their matching errors are few; but if 
there are many then their mean angular deviations become almost twice as large. While it must 
be kept in mind that this effect could not be adequately tested statistically, the almost exact 
concordance in the two samples strongly suggests that this is a genuine effect. 


Angle free (AF) versus angle pre-set (AP) 

Analysis of the data presented so far indicates that a majority of the African as opposed to a 
small minority of Scottish subjects are characterized by what might be called an ‘indifference 
range’ for rotations up to about 45°, with some having an even wider one. There is also the 
preference for ‘stable orientation’ (i.e. sides parallel to the edge of the table) established by 
Deregowski (1972). Furthermore, the AP condition is one where the experimenter adjusts the 
angle prior to each trial, thereby providing a constant reminder that orientation matters; this is 
absent in the AF condition, where the response board is, at the beginning, always in an exact 
diagonal (45°) position following the last training trial, and thereafter entirely under the subject’s 
control. 

On this basis it is possible to formulate some expectations about the subjects’ probable 
behaviour under the two treatment conditions. Following Deregowski, it could be predicted that 
subjects will exhibit smaller deviations on horizontal as compared with diagonal patterns. In both 
AF and AP African subjects will tend to produce ‘rotations’ by being content to leave the blocks 
in an orientation within their indifference range. However, subjects in the AP conditions will 
exhibit more awareness of orientation needs by a greater frequency of adjustments. 

The detailed examination of the data was based primarily on comparisons of the initial and 
final angle setting on each trial, and thus it was not necessary to exclude incorrect pattern 
completions from the main analysis. In the case of condition AF a simple model was set up, to 
which actual responses were related. Ignoring the minor accidental dislocations which occurred 
when placing the blocks, it was postulated (a) if two identically oriented patterns follow each 
other, the difference in angular setting between the beginning and the end of each trial should 
be either 0° or some multiple of 90°; and (b) in a sequence involving a horizontal and a diagonal 
pattern the difference should be either 45° or 45° plus some multiple of 90°. It was decided to call 
(a) the constant and (b) the variable, and the actual mean values for each (MC and MV) were 
calculated. The results were scrutinized and a set of categories constructed, which will be 
explained. 

The first category is closest to the ideal, though some allowance has to be made for drift 
and/or minor disorientations, whose limits were set at 11°. The second category provides even 
more latitude, the lower limit for MV being put at 22-5° or half the optimal range; in order to 
ensure that this category still reflects a clear separation between (a) and (b), a minimum ratio of 
3:1 for MV:MC was arbitrarily laid down. Two more categories indicate an absence of (a) 
versus (b) differentiation: the fourth category refers to cases where both MC and MV are less 
than 12°; the fifth (excluding any case in the fourth) are instances where MC is larger than MV: 
lastly there is a third and unspecified intermediate category. 

These categories are set out systematically in Table 6, which makes it evident that nearly 
two-thirds of the Ghanaian subjects seemed quite content to use the response board as it 
happened to be left from the previous trial; and it would appear that the main difference between 
categories IV and V is that the latter produced larger shifts in the course of pattern construction. 

In the ‘angle pre-set’ condition subjects had to move the response board 2214° on each trial if 
they were to achieve an exact orientation response. Hence the measure adopted here was that of 
‘no change’, where the subject altered the orientation only very slightly or, apart from slight 
drifts during construction, not at all. The operational definition embodied the same range of 
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Table 6. ‘Angle free’ treatment condition: Distribution of subjects in main response categories 
according to culture 


Response categories Ghana Scotland 

I MV: 45°+11°, MC: 0°+11° 6 22 

I MV >22-5°; MV/MC> 3-0 1 3 

Il Other 5 3 

IV MV <12° and MC < 12° 13 2 

V MV<MC (excluding IV above) 5 _ 
30 30 





(Combining I and II, IV and V) x? = 23-43, d.f. =2; P< 0-001. 
Note: MV stands for mean of the variable sequences involving pairs of differently oriented patterns; MC 
stands for mean of the constant sequences involving pairs with the same orientation. 


tolerance as that employed in the AF condition, namely plus or minus 11°, irrespective of 
whether the movement was just under half-way\towards the target or in the opposite direction 
away from it. There were a few cases of substantial rotat:on which presented a problem, as they 
ended up in an equivalent orientation to the original pre-setting (i.e. either 224° or 337!4°+11°); 
since this implied that the subjects accepted this position, it was decided to include them in the 
‘no change’ category. Lest it be thought that the tolerance range of 11° might be too generous, it 
should be mentioned that the actual mean deviation from the pre-set orientation, identical for 
both Scotland and Ghana, was only 3-3°. Resuits of the amalysis are given in Table 7. Although 
the difference between Ghana and Scotland is still highly significant, it should be noted that it is 
less extreme than under the AF condition where there was very little overlap. 


Table 7. ‘Angle pre-set’ treatment condition: Numbers of subjects in ‘no change’ categories and 
mean numbers of ‘no change’ per category according to culture 


Ghana Scotland 
No. ‘no change’ e 
responses grouped. Mean no. Mean no. 
into frequency ‘no change’ ‘no change’ 
categories n per subject n per subject 
I 0 2 0-0 li 0-0 
II 1-4 17 2-4 16 1-8 
Il 5-8 5 64 3 5-7 
IV 9-12 6 10-2 — — 


Overall mean number of ‘no change’ responses: Ghana, 4-47; Scotland, 1-53; ¢= 4-06. d.f. = 58, P< 0-001. 


The prediction concerning ‘stability’ was tested by comparing the mean deviations for squares 
versus diamonds. Under the AF conditions these were respectively, 13-1° and 27-5° for Ghana 
(t= 4-58, d.f. =29, P< 0-001); and 4-3° and 10-5° in Scotlend (t = 3-30, d.f. = 29, P< 0-005). 
Although rather unlikely, there is the possibility under condition AF that this might be an 
artifact, for the following reason: the last training trial, accompanied by instructions about the 
importance of orientation, was a diamond. Now if one assumes that this instruction was 
remembered and carried out for the first experimental trial only, to be neglected thereafter, the 
tendency not to adjust the response board would in itself -esult automatically in greater 
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apparent rotations of the diamond shape. In fact, even among the Ghanaian subjects 21 out of 30 
had a disorientation of less than 11° on the first experimental trial, indicating that the 
supposition is reasonable. : 

This problem can be resolved by considering the AP condition, where the initial setting was 
constant, and focusing on the incidence of ‘no change’ responses for diamond as compared with 
square stimuli. The proportions were 0-72 for Ghana and 0-68 for Scotland, indicating a highly 
significant tendency (P< 0-001 and 0-01 respectively by binomial test) to leave the diamond 
pattern unchanged. Moreover, in addition to the ‘no change’ responses there were 12 Ghanaian 
and three Scottish subjects who produced one or more rotations to stability (the difference by 
Fisher’s exact test is P= 0-008); on the other hand there were only three subjects, all Ghanaians, 
who each made one rotation to the diamond. Hence one may be confident that the findings in the 
AF condition were not artifact, and the ‘stability’ prediction was fully confirmed. 

Discussion 

The more detailed analysis of performance shows that behaviour in AF and AP is very different 
for Ghanaian, though not Scottish subjects. In AF the majority of Ghanaians behaved as though 
entirely unconcerned with orientation, while this was true only of a minority in AP. One can 
further examine this by taking frequency of final adjustment as an index of extent of 
preoccupation with orientation. For the Scottish subjects the mean adjustment rate out of 12 
trials was 9-67 for AF and 10-57 for AP, a slight but non-significant difference in favour of AP. 
The important point is, however, that most Scottish subjects adjusted the orientation on almost 
every trial irrespective of condition. 

The contrasting picture for Ghanaian subjects is presented in Table 8, which relates 


Table 8. Mean adjustment rates for Ghanaian subjects according to condition and response 
categories 


Response Response 

categories* n Mean categoriest n Mean 
I and I 7 113 I and II 19 10-2 
I 5 66 m 5 8-2 
IV and V 18 23 IV 6 1-5 
Overall 30 4-7 Overall 30 8-1 


Difference between overall means (by Kolmogorov-Smirnov Test) P< 0-01. 
* See Table 6 for definition of categories. 
+ See Table 7 for definition of categories. 


adjustments to response categories. There is a close correspondence between these categories, 
based on the extent of rotations, and the adjustments made. Moreover, the difference between 
adjustment rates in conditions AF and AP is significant, confirming the view that this reflects 
varying concern with orientation. 

Given this contrast between the conditions, an obvious question is why this failed to result in 
a significant difference in angular deviation scores. A partial answer is the high variance, inflated 
by occasional gross rotations of up to 180°. A more interesting factor emerges from the 
observation of behaviour in the AP condition: the adjustment of diamond patterns by Ghanaian 
subjects, while definite and quite clear cut, tended to be rather slight in magnitude. In 
conjunction with the findings of the pattern matching task this suggests that Ghanaian subjects, 
unlike Scottish ones, had no clear conception of an angle of 45° which needed to be reproduced. 
Most behaved as though they had merely a somewhat vague implicit notion of the following 
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kind: ‘This pattern (the stimulus) is not straight, and the response board is not straight either; 
but perhaps it should be inclined just a little more and them it will be the same’. This applied 
predominantly to the diamond shapes, for as Rock (1972) put it ‘. . .the horizontal and vertical 
orientations are singular; change them even a few degrees and you make a very noticeable 
change. But change a line tilted 10° from the vertical to 15° from the vertical and the change may 
not even be noticed’ (p. 10). Another related factor is that among those who adjusted, 12 
Ghanaians did so in the wrong direction, i.e. towards stability; this also had the effect of 
increasing overall mean deviations. Taken together, these “actors account for the lack of 
significant differences between AF and AP in spite of the tact that these conditions elicited 
contrasting patterns of behaviour. 

This is the appropriate place to return to another questicn, namely, why there was no task 
difficulty effect for Ghanaians under the AF condition? Th answer seems to be that an essential 
prerequisite of this effect is an active concern for orientation, which was largely absent under 
AF. If one looks at subjects in AF response categories IN-V whose rates of adjustment were 
low and compares fastest with slowest response times the outcome in terms of mean angular 
deviations is as follows: 11 greater, 11 smaller and one equal —i.e. entirely random. On the other 
hand six out of the seven subjects in categories I and II had greater deviations on the slower 
times. The reason why this did not show up in the overall results is that these deviations were 
on the average rather small (7-3°) as compared with the random fluctuations (14-3°), and were 
thus swamped. 

This raises yet another issue, since in the present writer's previous study (Jahoda, 1976) the 
difficulty was encountered consistently across two different conditions, albeit within-subject 
ones. Although the Ghanaian sample in that study was perhaps slightly more ‘modernized’, it is 
most unlikely that this could account for the much lower level of deviations, which in one 
condition were disorientations rather than rotations. The main determinant of these sharp 
differences between the two Ghanaian samples is almost c2rtainly to be found in the substantial 
variations in the mode of administering the orientation task. 


Conclusion 


In considering the implications of the present study, it is recessary to distinguish two main 
problems: (a) the factors influencing orientation errors in general and (b) those responsible for 
cross-cultural differences. With regard to (a), findings are in accord with those of Serpell (1971 a, b) 
and Deregowski (1972). They further confirm the effect of task difficulty first demonstrated in 
earlier work (Jahoda, 1976), though this appears to be a relatively weak effect confined to 
circumstances where the subject is actively concerned wita achieving a correct orientation. 

As for (b) the problem of cultural differences, the present writer’s expectation that 
psychological differentiation would prove to be a decisive variable was disconfirmed. Instead, 
the subjects’ conception of ‘sameness’ was found to be a key feature, which in turn influenced 
the manner in which the task requirements were perceivec. This is not just a semantic issue 
dependent upon the verbal formulation of the instructions since the subjects understand quite 
clearly what ‘same’ and ‘different’ mean in relation to the internal structure of the-patterns. 
Moreover, after the detailed explanation and demonstratian given during training, subjects often 
began by responding correctly on one or two experimental trials, only to relapse thereafter into 
neglecting orientation. The disposition to respond in this manner appears strongly established in 
some subjects, but readily susceptible to situational modification in others. 

Unfortunately the data on this important aspect remained incomplete, and it was therefore not 
possible to explore individual variations in any depth. There was, however, a significant 
association between the amount of error in matching scores and field dependence, indicating that. 
psychological differentiation may have at least an indirect effect. The same may apply to the 
understanding of the concept of horizontality examined in the previous study, though it would 
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appear from the results that this is most likely to be operative where a square design has to be 
constructed on a diamond-shaped background (Jahoda, 1976). 

In general, it is evident that the determinants of orientation error are probably more complex 
than has hitherto been envisaged; and the confounding of individual, cultural and situational 
differences makes them difficult to disentangle. Nonetheless, the evidence strongly points to the 
key importance of the subject’s perception of the task, which has also been noted in other 
contexts; Day (1975) discusses research dealing with children’s conceptions of task requirements, 
and the way this affects their behaviour. Lastly, one source of confidence in the present findings 
lies in the fact that all the phenomena observed were found in both the Ghanaian and Scottish 
subjects, though their incidence in these populations was very different. 
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Staff ratings of children’s behaviour in hospital: Comparability of factor 
structures 


Fred Clough 





Behaviour-ratings for a common sample of 108 young adolescent gurls in an orthopaedic hospital situation 
were obtained concurrently from three independent groups of ward-staff observers. The ratings were factor 
analysed both within observer groups, and across groups (using mean scale item scores), and factorial 
structures were analysed for comparability. It was found that the independently derived group structures 
were highly similar, replicating a four-dimensional system common to a number of analyses in other social 
settings, and recently given prominence in a review by Howarth (1976). In particular, the separation of 
fearful from hostile emotional behaviours, and from a neutral sociability dimension, was predicted and 
confirmed. However, whilst correlations between matched factor loadings were generally high, those 
between matched factor scores were much lower. It is argued that the procedure of averaging judges’ ratings 
prior to factor analysis is useful not only to increase reliability, but to improve validity by extending the 
range of sampling across different role situations. Hence in spite of a relatively low inter-judge agreement, 
the combined solution offers a more valid composite criterion of ‘characterstic ward behaviour’ having a 
high multiple correlation with the independently derived predictor group observations. 





The comparability of factorial structures derived from observer behaviour ratings continues to be 
debated (Howarth, 1976). An emergent empirical generalization is that two broad dimensions of 
emotional stability and extraversion (sociability) will be identifiable in any reasonably 
representative set of ratings. In view of the diverse sampling of scale items, subjects, judges and 
situations, as well as variations in methods of analysis, the wide replication of more 
differentiated structures will be less readily attained. 

Two major explanations have been offered to account for the universality of the emotional 
stability-extraversion structure. Eysenck (1967) proposes that they represent phenotypic 
indicators of fundamental personality differences, biological in origin. On the other hand, 
Peterson (1965) argues that they reflect communialities in language usage amongst judges 
(including the self), similar to the broad dimensions of connotative meaning established by 
Osgood et al. (1957). Hallworth (1965), following a series of studies of teacher ratings, came to 
the same conclusion. 

What is omitted from these perspectives is a consideration of the potential systematic 
variation attributable to situations, where the emphasis will be upon the structure of situated 
behaviours rather than upon the stable characteristics of either subjects or observers in general. 
Studies in the personality-trait tradition, particularly those utilizing self-ratings, tend to be 
conducted in a ‘situational vacuum’, so that it is not surprising that Becker (1960), and many 
subsequent investigators, found little evidence for a correspondence between behaviour rating 
and questionnaire personality factors. Questions concerning the structure of situated behaviours, 
however, must be distinguished from those concerning personality traits. 

Problems of interpretation in the case of behaviour rating dimensions are complicated by the 
presence of an inevitable person situation interaction. Dispositional tendencies, for example, 
will influence individual thresholds for emotional reactions, but whether these will be aroused at 
all and the form they will take (fear or hostility) will depend on the nature of the situation and 
the individual’s interpretation of events (Lazarus, 1966). Similarly, sociability will be constrained 
by interaction opportunity, and by the ‘approachability’ and ‘knowability’ as well as the 
‘likeability’ of significant others (Gibson, 1971; Vingoe, 1973). Social psychologists have also 
drawn attention to the functional significance of interaction-seeking in emotionally arousing or 
uncertain situations (Schachter, 1959). Hence some of the major dimensions established in 
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behaviour ratings can be interpreted equally well from a situational as from a dispositional 
perspective. 

A second set of problems concerns the reliability of behaviour ratings and the stability of 
obtained factor structures. The comparability of rating structures across different social settings is 
of interest from both the personality and the situational perspectives, and common problems are 
introduced by potential observer factors. In the first place matched factor structures have no 
necessary implications for trans-situational consistencies in individual behaviour, even where 
both judge and subject samples remain constant. Judges may employ identical systems of 
classification, yet order individuals in quite different ways in terms of that common system. 
Nevertheless, it is difficult to conceive of adequate tests of the personality hypothesis using 
observer ratings unless there is adequate comparability of judges’ classification systems in the 
first place. Whilst the cross-situational consistency of rating observations and structures is less 
imperative from a situational perspective, similar questions may be asked about the 
comparability of judges within any given social setting. In this case, additional problems may be 
encountered in attempting to define ‘a situation’, so that variations between observers may be 
due to differences in situational sampling rather than in observer error. 

There is, however, encouraging evidence that structural congruency can be achieved, even 
where subjects and observers differ across studies, so long as salient situational factors and 
rating scales are controlled. Studies of adult groups, initiated by Tupes & Christal (1961) and 
subsequently extended by Norman (1963), provide evidence for a recurrent five-factor structure 
of peer ratings differentiating between independent dimensions of agreeableness, extraversion, 
emotional stability, conscientiousness and culture (or intelligence). The important improvement 
in this structure over a two-factor system lies in the differentiation of emotionality into 
recognizable fear (instability) and hostility (disagreeableness) components, whilst maintaining the 
independence of social extraversion; the fourth factor, ‘conscientiousness’, refers to task 
orientations. Borgatta (1964) replicated a highly similar structure of dimensions common to self 
and peer rankings, and established adequate discriminant validation using the stringent 
multitrait-multimethod design advocated by Campbell & Fiske (1959). 

The importance of these studies has recently been fully recognized in a reassessment of 
Cattell’s work by Howarth (1976), who questions whether Cattell’s ‘personality sphere’ factors 
were correctly identified in his original foundational behaviour-rating studies. In effect, Cattell 
(1947) had isolated approximately 12 dimensions from observer ratings, whilst Tupes & Christal 
(1961) using the same scale items, established only five strong recurrent factors. Howarth 
suggests that Cattell overfactored in his initial work, and a reanalysis using more objective 
methods identified a six-factor system showing considerable agreement with the findings of the 
Tupes—-Norman—Borgatta studies. 

It is clear that a five-dimensional structure of behaviour ratings offers considerable advantages, 
when compared either with the relatively undifferentiating two-factor system of Eysenck or with 
the unstable multiple factor system of Cattell. The question arises as to whether replication can 
be maintained across different behaviour settings, especially where naive participant observers are 
utilized. Herbert (1974) does report a structure of teacher ratings for pupil behavicurs which 
shows increased differentiation over previous studies in that field, and which does approach the 
Tupes & Christal structure, in spite of the specialized setting. 

The present study aims to investigate the dimensionality of staff ratings for a sample of young 
pre-adolescent girl patients in an orthopaedic hospital ward. In spite of considerable research 
concerned with the psychological reactions of patients to hospitalization, there has been little 
systematic analysis of behaviour-rating structures, apart from psychiatric samples. In addition to 
exploring the generality of the five-factor structure in a hospital setting, a further line of inquiry 
is also pursued. This concerns the comparability of both rating structures and factor scores when 
three different groups of judges are required to rate the same patient sample. Central to these 
questions are problems in the assessment of reliability and validity, as discussed by Levy (1974). 
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Method 
Subjects 


108 girls aged 3-15 years (mean 13 years 3 months; s.D. 1 year 9 months) were the ratees. All were in-patients 
of children’s =rthopaedic wards, one ward being selected from each of three similar hospitals. Disregarding 
hospitals, three groups of raters were identified amongst the female hospital staff: Group (T) were hosrital 
teachers; grovo (N1) were fully qualified experienced staff nurses and ward sisters, whilst group (N2) were 
student nurse and unqualified nursing assistants. All raters had close daily contact with the children in the 
wards, thougt the current length of patients’ hospital stay varied widely (mean 27-9 days; s.p. 27-8 days). 
Similarly, there was a wide variation in illness and prior hospitalization history, in diagnosis, and in the 
nature of medial treatment. In some cases (scoliosis), children had been severely restricted and recumoent 
in long plaster casts for several months, whilst others were treated for minor orthopaedic surgery or were 
under observation. As far as possible attempts were made to ‘measure’ these factors, as well as a range of 
important cog-ative variables, though their relationship to observed behaviours will not be reported in this 
paper. 


Procedure 

A standard ratag instrument containing 28 bipolar seven-point scales was used throughout (see Table 2) 
Marker scales <overing the major factors of Tupes & Christal (1961) were included, but other scale items 
considered aprsopriate to patient behaviour were specially constructed. Each child was rated independently 
on the same occasion, by three staff observers, one from each of the groups T, NI and N2. Judges ‘were 
instructed indiv-dually to report the recent characteristic behaviours of the patient in the ward, to consider 
each scale iter independently, and to use the full range of categories on the seven-point scale. Only the 
end-points of ech bipolar scale were written into the rating schedule, and these were separated by a brcken 
line of seven egual intervals. The descriptive labels at the end-points were supplemented by additional. 
quantifiers suct as ‘very’, ‘extremely’, ‘not at all’, etc., and the mid-point was defined as an intermediate or 
neutral categor? on a continuum. 

The sampling >f judges and of occasions was random, in so far as patients and staff were entered into the 
study as availat-e at the time of the irregular visits of the researcher over an 18-month period. Most judges 
were used on mere than one of a total of 29 sampling occasions, each involving the rating of from two tc 
six patients. Almgether, eight teachers, 12 senior nurses and 22 unqualified nurses contributed in differen: 
combinations. Each patient was rated once only. 


Results 
Inter-judge agrzement within length of stay categories 


It was suspect=d that staff ratings might be affected by the amount of time a child had spent in 
the ward, since this would largely determine interaction opportunity. On the other hand, staff 
were instructed to assess, as far as possible, patients’ recent behaviour in the ward, and to 
attempt to ignc-e more distant impressions. If this difficult instruction has been followed 
successfully, imter-judge agreement should not be significantly greater for ratings of long-stay 
than for short-xay patients. 

In order to test this hypothesis, the total patient sample was divided into three equal groups 
(n= 36) accord-1g to current length of stay (short stay 1-15 days, medium stay 16-45 days, and 
long stay 46-208 days). Correlations were computed between each pair of observer groups (T, 
N1 and N2), w-hin each length of stay category and across all 28 scales of the rating schedule. 
All the product noment coefficients were statistically significant, well beyond the 0-001 level, 
and ranged fror- 0-308 to 0-473 with an overall average correlation of 0-40. No difference 
between any pair of coefficients was statistically significant. There was thus evidence for 
significant agreenent between all groups of raters, and no evidence for a significant variability in 
agreement due D patient length of stay. 

This result inGicates that the typical reliability of a single rater’s ratings is of the order of 0-49 
and the estimated reliabilities for each set of ratings based on inter-judge correlations is shown 
in Table 1. Whi=t these reliabilities are low, there is a sufficient basis of agreement for pooling 
ratings from the three observer groups, and carrying out further analysis using mean scale 
scores. In this czse the Spearman-Brown formula can be used to estimate the increased 
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Table 1. Estimated reliability coefficients for behaviour ratings of patient groups differing in 
length of stay 








Short stay Medium stay Long stay 

(1-15 days) (16-45 days) (46-200 days) Total sample 
Rater groups n= 36 n= 36 n=36 n= 108 
Teachers 0-39 0-34 0-45 0-39 
Nurses (1) 0-33 0-38 0-47 0-39 
Nurses (2) 0-37 0-38 0-46 0-40 
Combined ratings 0-63 0-63 0-72 0-67 





reliability yielded by a threefold increase in test length (Guilford, 1965, p. 466). These reliability 
estimates for averaged item ratings are also shown in Table 1, and range from 0-63 to 0-72 across 
the length of stay groups. There is a slight but consistent tendency towards an increase in 
reliability for ratings of the long-stay patients. 


Principal components analysis (1): Mean staff ratings 


Ratings were averaged across the three judges for each scale item and for each patient. The 
resulting mean scores for the 28 item rating scales were intercorrelated, and the matrix with 
unities in the diagonal was submitted to a principal components analysis. Five components with 
latent roots over 1-0 extracted 76-36 per cent of the total common variance. These were rotated 
to orthogonal positions using the varimax method. The results are shown in Table 2, where scale 
items are defined by one abbreviated pole of the original seven-point bipolar format. 
Interpretation of the factors was as follows: 


Factor 1. Compliance-non-compliance. This corresponds directly to the agreeableness factor of 
Tupes & Christel (1958) or (in reverse) to the assertiveness factor of Borgatta (1964). Howarth 
(1976) calls it cooperativeness or considerateness. It suggests a primary dimension of patient role 
evaluation, with particular reference to authority relationships. The ‘good’ patient is cooperative 
(21), good-tempered (2), agreeable (13), obedient (3) and respectful (7). Items suggesting positive 
motivation for hospital school activities (8) (12) (24) also share significant loadings with factor 4. 

While the negatively valued pole suggests the well-known ‘conduct problems’ or ‘antisocial 
behaviour’ factor found consistently in ratings of normal and delinquent children (Petersen, 
1961; Quay & Quay, 1965), there is here a clearer reference to the child’s acceptance-rejection 
of the sick patient role. The theoretical significance of such a dimension is well documented by 
medical sociologists (Parsons, 1951; Lorber, 1975). 


Factor 2. Expressiveness or social extraversion. The highest and purest loadings are for 
non-evaluative items, indicating the extent to which the patient asks questions concerning 
aspects of hospital life (28), medical treatment (20) and illness (16). High scorers are also 
generally curious (4) and more excitable (18); they freely express their feelings (26), tend to be 
talkative (1), defiant (23), to show off (11), to have many friends (17) and to stand out as leaders 
amongst patients (15). This latter group of social extraversion items, however, also shows 
moderate loadings on other dimensions, suggesting that it is the expressive behaviour of the 
patient as revealed in her cognitive, affective and social activities which unite the various 
extraversion components of this dimension. 


Factor 3. Emotional anxiety. Only items describing patients as tense-relaxed (10) and easily 
upset — not easily upset (14) have high and pure loadings on this factor. Other associated items 
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Table 2. Mean staff behaviour ratings. Loadings of items on five varimax factors 





Varimax factors 





Scale item I U I IV Vv 
21 Uncooperative 88 22 15 -17 —08 
2 Bad-tempered 84 09 20 -17 ~06 
13 Quarrelsome 82 27 17 -03 -01 
3 Disobedient 81 26 -%4 -13 02 
7 Disrespectful 80 23 02 -21 —06 
9 Unaffectionate 65 -29 19 -10 18 
5 Unfriendly 65 -43 42 —16 02 
25 Little concern for others 65 —41 27 -16 -01 
23 Defiant 52 66 ~30 12 -14 
11 Shows off 42 74 ~27 05 -16 
17 Has many friends —48 58 = -40 08 04 
28 Asks questions (hospital life} —07 8 = -06 06 06 
4 Full of curiosity -09 83 —25 06 07 
26 Expresses feelimgs 14 82 04 02 -14 
20 Asks questions {treatment) 01 81 22 14 06 
1 Talkative 02 81 —4) -03 -07 
16 Asks questions “illness) -01 80 20 25 08 
15 A leader 12 65 —41 30 08 
18 Excitable 31 64 39 ~14 -15 ` 
10 Tense 27 -14 76 -24 -03 i 
14 Easily upset 32 06 74 -13 —24 
6 Miserable 60 —24 59 -12 00 
22 Poor self-contrci 63 15 47 —36 ~15 
8 Good concentrafion —44 07 —17 76 06 
12 Eager to learn —49 16 —19 71 -06 
24 Alert —43 40 -20 65 03 
19 Acts superior 36 45 -31 45 -06 
27 Helps herself 00 00 -18 01 94 


Percentage of total variance 25-79 26-54 11-65 8-29 4-10 


link with negative aspects of sociability; the patient tends to be miserable (6), shows poor 
self-control (22), social withdrawal (5, 17, 1), excitability (18) and is generally submissive (15, 23, 
19). Taken together, all contributory loadings reflect three major first-order traits, which combine 
to define second-crder neuroticism in Cattell’s system: these are (a) emotional instability, (b) 
tension, (c) guilt proneness, timidity, withdrawal and shyness (Cattell & Scheier, 1961). In 
Eysenck’s taxonomy, the dimension reflects introverted neuroticism, or ‘personality problems’ 
(Peterson, 1961). ~ 


Factor 4. Ego strength. Highest loadings are shown for good concentration (8), eager to learn (12) 
~ and alert (24); lower loadings appear for acts superior (19), good self control (22) and leadership 
(15). This factor has emerged from a number of analyses of children’s rated behaviour, but there 
` has been little agreement about what to call it. Close to the present cluster is a group of items 
described by Cattell (1963) in teacher ratings (speed of learning, attentiveness and alertness), 
corresponding to his B (intelligence) factor in questionnaire data; on the other hand, it also has 
close affinities witk Cattell’s Factor C (ego strength) which loads on rated emotional maturity, . 
persistence and responsibility (Cattell, 1965). 

Other studies fird evidence for a similar factor, variously labelled immaturity (Quay & Quay 
1965), Pro-social bz=haviour (Ross et al. 1965) and conscientiousness (Norman, 1963). Hallworth 
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(1966) reported that the pupil characteristics of reliability, persistence and conscientiousness in 
teacher ratings loaded with emotional stability on the ‘good pupil’ dimension. Herbert (1974) set 
out to define this factor more clearly in teacher ratings, and successfully established its factorial 
independence; he named it competence since, as in the present study, the purest items suggested 
an optimum arousal, concentration and motivation underlying effective learning and 
performance. It has a close affinity with the construct of achievement motivation, particularly 
the persistence component, but likewise also with the impulsivity aspect of extraversion which 
clearly has implications for learning effectiveness. 

Whether this factor in the present case is situationally specific to a patient’s state of health, to 
the nature of treatment, or to relationships with teachers, appears unlikely in view of its 
emergence in many other studies, but these are testable hypotheses. The present cluster, 
however, is closest to that reported by Cattell (1965), and its motivational meaning may be 
tentatively indicated by the term ego strength; it suggests an optimum state of readiness, coping 
capacity and morale underlying effective role performance. 


Factor 5. Dependence-independency, This specific factor will not be considered further since it is 
loaded only on the single item ‘helps herself — seeks the help of others’ (27). This item was 
included on the hypothesis that dependency-independency would perhaps be dominant in ratings 
of patient behaviour, but the hospital staff appear to have utilized a more differentiated approach 
towards their rating task. 


Principal components analyses (2): Separate staff group ratings — factor congruency 


In order to check the generality of the combined (means) analysis, identical varimax procedures 
were in turn applied to the ratings of the three independent groups of observers — the teachers (T), 
the experienced nurses (N1) and the unqualified and student nurses (N2). 

The teachers’ analysis extracted six factors accounting for 69-84 per cent of the variance, 
whilst N1 (76-53 per cent) and N2 (71-54 per cent) each extracted five factors. On inspection, an 
overall similarity was found between the three group structures, and these corresponded in general 
to the five dimensions of the previous combined (means) analysis. 

However, there were discrepancies also, and in order to establish the strength of the 
relationships more objectively, the 28-item varimax loadings for each factor were intercorrelated 
within and between observer groups, as well as with the sets of factor loadings of the combined 
analysis. Product moment correlations were used in these tests of factor congruency, an 
acceptable procedure where both scale items and subjects are constant as here. However, using 
this method it is necessary to tolerate a degree of correlation between otherwise orthogonal 
dimensions, and to accept only very high correlations as evidence for congruency. 

In order to establish base-line comparison levels for correlations, the coefficients as calculated 
between loadings of orthogonal varimax factors were first inspected in the case of the means 
analysis. These ranged from 0-05 to 0-59. Similarly, correlations within each of the three 
observer groups revealed a comparable range, from 0-01 to 0-47 for group T, 0-01 to 0-58 for N1, 
and 0-08 to 0-41 for N2. Therefore it is suggested that evidence for congruency across groups 

- should require correlations in excess of these upper levels, that is beyond 0-60 approximately. 

A second analysis correlated each set of group factor loadings in turn with the loadings of the 
means solution. Table 3, in effect shows part—whole correlations, indicating from the point of 
view of loading patterns, how the various observer dimensions have contributed to the combined 
(means) structure. Whilst there are a number of instances of overlap, there is nevertheless a 
sufficient basis for identifying a prominent group source factor associated with one means factor 
rather than another, as indicated by the higher correlations bracketed along the diagonals. These 
correlations support a conclusion that the separate factor structures do generally match the 
combined means structure, in spite of the use of blind rotational procedures. 
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Table 3. Intercorrelations of factor loading patterns for rater groups vs. combined (means) 
solutions 





Combined 
(means) 
factors Non compl. Expr. Anx Ego Indep. % variance 
Group N1 17 18 19 20 21 
1 (0:94) —0-21 0-51 ~0-44 —0-14 23-78 
2 0-18 (0-89) —0-49 -0-04 —0-16 22-51 
3 0-63 —0-27 (0-90) —0-67 —0-24 11-94 
4 -0-61 —0-04 —0-42 (0-93) 0-33 10-45 
5 0-11 —0-62 0-06 0-17 (0-46) 8-44 
76-53 
Group N2 
6 (0-95) —0-06 0-50 -0-77 0-08 24-91 
7 ~)-41 (0-88) —0-48 0-30 0-08 18-92 
8 0-48 —0-39 (0-92) —0-33 —0-22 8-12 
9 0-27 0-68 —0-49 (0-24) —0-23 14-25 
10 0-20 —0-28 —0-02 O11 (0-78) 5-34 
71 54 
Group T 
11 0-79) 0-41 0-04 —0-37 —0-23 15-93 
12 —3-06 (0-81) -0-01 0-01 -0-12 14-67 
13 3-47 —0-28 (0-88) —0-62 —0:35 10-00 
14 ~)-69 -006 —0-49 (0-86) —0-06 9-30 
15 —)-23 0-06 —0-33 ~-0-07 (0-81) 447 
16 3-67 0-52 —0-62 -0:39 0-01 15-47 
69-84 
Note 


(1) Brackets indicate predicted matching factors; correlations in excess of 0-60 offer evidence for 
congruency. n= 28 scale items. 

(2) The final cclumn indicates the percentage of common variance accounted for by each rater-group 
factor. In general the relative importance of a given factor (in variance terms) remains much the same 
across different rater-groups. . 


On the basis of an inspection of factor content, combined with the evidence of Table 3, 
further correlations were predicted between sets of loadings for the 16 factors of the group 
varimax dnalys2s. The intercorrelations are shown in Table 4, and once again it is possible to 
find matching s2ts of loadings across independent analyses, which exceed the upper level of 
expectancy (0-€0) previously established for within-groups correlations between orthogonal factors. 
Again, discrepencies are shown by overlapping rather than totally mismatching loading patterns, 
except that where comparisons with N2 are involved, there is no consistent matching for the ego 
strength dimension. For N2 (the inexperienced nurses), ratings of ego strength identify with 
other dimensions of judgement. It is suggested that since ego strength assesses motivation for 
learning or task competence, it is most salient for staff who have major responsibilities for 
teaching and instructing the patient in educational and medical tasks (N1 and T). The 
correlations in doth Tables 3 and 4 suggest further that the N2 judges have perhaps confounded 
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Table 4. Intercorrelations of factor loading patterns between three independent group solutions 
(n= 28 scale items) 


(a) Experienced (N1) vs. inexperienced nurses (N2) 


(N2) 
6 7 8 9 10 
Noncompl. Expr. Anx. Ego Indep. 
1. Non compl. (0-84) -0-51 0-56 0:16 0-25 
2. Expr. 0-13 (0-64) —0-45 0:81 ~0-20 
(N1) 3. Anx. 0-64 —0-48 (0-84) —0-21 —0-03 
4. Ego strength —0-76 0-26 —0-26 (—0-08) 0-27 
5. Indep. —0-08 —0-69 —0-01 —0-13 (0-49) 
(b) Experienced nurses (N1) vs. hospital teachers (T) 
D s 
11 12 13 14 15 16 
Noncompl. Expr. Anx. Ego Indep 3 
i. Non compl. (0-67) —0-06 0-46 —0-59 —0-22 0-70 
2. Expr. 0-64 (0-62) —0-36 —0-09 —0-01 ~0-39 
(NI) 3. Anx. 0-21 -0-03 (0-86) -0-59 ~0-31 0-60 
4. Ego strength —0-50 0-04 —0-58 (0-79) 0-21 0-30 
5. Indep. 0-02 —0-76 0-05 0-05 (0-48) 0-19 
(c) Inexperienced nurses (N2) vs. hospital teachers (T) 
T) 
11 12 13 14 15 16 
Noncompl. Expr. Anx. Ego. Indep. ? 
6. Non compl. (0:71) —0-06 0-52 —0-80 —0-13 0-61 
7. Expr. —0-03 (0-73) —0:39 0-30 0-03 ~0-69 
(N2) 8. Anx. —0-02 0-04 (0-81) —0-29 -0-31 0-55 
9. Ego strength 0-67 0-35 —0-38 (—0-09) ~0-13 —0-18 
10 


. Indep. ,=0:03 —0-19 —0-17 0-05 (0-58) 0-26 


compliance with high ego strength or competence, thus missing an important motivational 
distinction between passive acceptance of the patient role and a more positive striving for 
excellence in instrumental role performance. On the whole, however, the correlational analysis 
supports the prediction that groups N1, N2 and T have utilized similar rather than different 
classification systems in rating behaviour for a cemmon patient sample. 


Correlations between factor scores 


Since the previous analyses provided evidence supporting the congruency of factors across 
separate group structures, and had also revealed an average correlation of 0-40 between rating 
groups across the total 28-item schedule, it was expected that the more stable weighted factor 
scores would correlate moderately if not highly between groups. This was not the case. 
Correlations between group ratings within matched dimensions did not exceed 0-41, although 10 
of 12 relevent correlations were statistically significant at the 5 per cent level; for non-matching 
dimensions, however, factor scores correlations between groups were much lower, thus: 
providing some evidence for discriminant validity. For brevity the results are shown in Table 5 
as average correlations of factor scores between different groups, for both matching dimensions 
(along the diagonal) and for non-matching dimensions (above the diagonal). 

In effect, the inter-judge agreement for weighted factor scores is slightly lower than the 
average agreement between raters for original scale scores (0-40). It is possible that even minor 
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Table 5. Average correlations between three sets of group factor scores for matching and 
non-matching dimensions (n = 108 ratees) 





Non compl. Expr. Anx. Ego 
Non compl. 0-25* 0-11 0-11 0-20T 
Expr. — 0-36* 0-07 0-11 
Anx. — — 0-17 0-13 
Ego — — — 0-23ł 


* r2025, P<0-01. 
t 720-19, P< 0-05. 


discrepancies of factor composition may have had drastic effects on inter-judge reliability. 
However, this is unlikely, since an examination of item correlations within each group analysis 
revealed consistantly high intercorrelations (typically beyond 0-60) for salient scale items defining 
each dimension. The internal consistency of the homogeneous factorially derived dimensions is 
therefore quite satisfactory. 

Levy (1974) has discussed the kind of difficulties which may arise when estimating reliabilities 
in rating studies. The major problem in the present case is that the mixed composition of each 
rating group eliminates any possibility of correcting for level differences between individual 
judges. There is evidence for consistent level differences between groups N1 and T, with 
teachers tending to report significantly higher levels of non-compliant and anxiety behaviours 
than nuses; but these differences are less disruptive of inter-rater reliability than other 
uncontrolled var-ations between different raters within the same observer group. 

A final explanetion for the poor inter-judge agreement is that judges have rated different 
samples of behaviour on the basis of their varying involvements in patient care and personal 
acquaintance. If this is so, then the issue is not necessarily one of reliability, but of validity; it 
suggests that a wider sampling of a patient’s behaviour has taken place across role relationships 
and over time, due to the use of three observers. At the cost of an inability to specify which role 
situation has provided the major source of behaviour in individual cases, the means analysis has 
perhaps provided a measure more representative of patients’ characteristic behaviours in the 
ward. The high correlations of group factor scores with their corresponding means factor scores 
(typically > 0-80) indicates that the combined structure represents a valid composite criterion 
having a high multiple correlation with the independently derived predictor group ratings. 


Conclusions 


A four-dimensional structure of behaviour ratings, strongly represented in studies by other 
researchers in this field, has been replicated in three independent groups of observers in a 
hospital setting. The common link in these studies is the successful separation of fearful and 
hostile emotional behaviours from a neutral dimension of sociability or social extraversion. This 
emotionality distiaction has commonly been maintained in those studies where ‘personality 
problem’ and ‘coaduct problem’ behaviours emerge as major factors, but in such cases the 
sociability dimension is typically lost. Such findings are readily explained as rotations of 
Eysenck’s E and N. dimensions into the quadrants defined by introverted and extraverted 
neuroticism; but many investigators have been equally dissatisfied with the failure to distinguish 
between the fear-hostility components of emotional instability (Brand, 1972), and equally 
between the sociability, impulsivity, and dominance components of extraversion. 

It has been argued that any tendency for more differentiated structures to emerge from ratings 
of social behaviour is due to the influence of situational factors, and further findings from the 
present study threw light on this problem, but are yet to be reported. Studies of personality 
structure will, of course, pose different questions from those concerned with an explanation of 
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situated behaviours, although ultimately solutions must be sought in terms of a person xsituation 


interaction. 


It was not possible in the present case to separate the variance due to persons, judges and 
situations, and thus to establish the contribution of many different sources of error, as 
emphasized by Levy (1974). The confounding of individual rater differences within observer 
groups leaves some important questions concerning reliability and validity unanswered. However, 
the findings do suggest that averaging procedures can overcome major problems of rater 
reliability in poorly controlled field situations, and that the underlying assumption of the 
comparability of raters’ classification systems is tenable. 
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Frequency and imagery in word recognition: Further evidence for an 
attribute model 


Peter E. Morris 


Murdock (1974) located the difference in recognition between common and rare words in the lower 
frequency of false positives to rare words. Morris & Reid (1974) have similarly located the superiority in 
recognition of easy to image (high I) words in their lower false positive rate. However, Gregg’s (1976) 
explanation of the word frequency effect in terms of encoding variability and the distinctiveness of word 
attributes predicts more hits as well as fewer false positives to rare words. Gregg's prediction was confirmed 
by the experimen: reported below. A subsequent survey of dictionary definitions of words differing in 
frequency and I-value further supported Gregg’s model and was incompatible with that of Glanzer & Bowles 
(1976). A model tased upon the likelihood of encoding variability and the distinctiveness of the attributes 
defining each word meaning accounts for the differences in recognition for items varying in I-value and 
frequency, and explains Galbraith & Underwood’s (1973) finding that abstract words are perceived as more 
common than concrete words. 


What processes underlie recognition memory? An explanation of word recognition should be 
compatible with a general explanation of the functioning of the memory system. The research 
discussed in this paper lends support to a model of recognition performance based on the 
marking and storage of lists of features or attributes which define a word’s meaning. Attribute 
models have successfully accounted for a wide range of memory phenomena, from the ‘tip of 
the tongue’ to semantic comparison times (e.g. Brown & McNeill, 1966; Herriot, 1974; Smith, 
Shoben & Rips, 1974), Such a model is, therefore, attractive. Initially, however, the research 
described below was motivated by a surprising similarity between the findings of Morris & Reid 
(1974) for the recognition of easily imaged (high I) and difficult to image (low I) nouns, and that 
by Grace (Murdock, 1974) for high- and low-frequency words. 

Morris & Reid (1974), in two experiments, found that there was no difference in hit rate for 
subjects who received mixed lists of high I and low I nouns. However, there were twice as 
many false positives to low I nouns as there were to high I nouns. This result has since been 
replicated severel times (Jones & Winograd, 1975; Winograd, Cohen & Barresi, 1976). Morris & 
Reid ascribed the difference in false positives to a greater similarity in meaning between low I 
than between high I words. They showed that low I words appeared far more frequently as 
synonyms of other low I words in dictionary definitions than was the case for high I words. 
There is considerable evidence (e.g. Anisfeld & Knapp, 1968; Fillenbaum, 1969; Grossman & 
Eagle, 1970) that new words which are similar in meaning to words they were presented in the 
list to be remembered are often incorrectly reported as having been in the initial list. These 
investigators have argued that features of the old words that are shared by the new words are 
marked as having been in the list. At the time of testing items with several such marked features 
are reported as old items, even if they have not been presented. 

The author was reminded of the results of the experiments on the recognition of high I and 
low I words by a report of a similar pattern of results given by Murdock (1974), when word 
frequency was manipulated. Murdock (p. 67) described an experiment conducted by Madge 
Grace in which subjects went through a pack of cards with a word written on each card. The 
subject’s task was to report whether or not the word had occurred earlier in the pack. The 
words varied in frequency, and Grace found that there was no difference in hit rate for high 
frequency (high F) and low frequency (low F) words, but that there were twice as many false 
positives to high F words than there were to low F words. This result, of course, parallels that 
of Morris & Reid (1974) for high I and low I words, and suggests that the locus of the word 


70 Peter E. Morris 


frequency effect (the well-known better recognition of rare than of common words) lies in the 
false recognition of new items. Perhaps there is a common basis for the Morris & Reid and the 
Grace results? Murdock argued that word recognition may be based on ‘negative certainty’ 
rather than positive certainty. He did not elaborate this view, except to suggest that high F items 
may share more attributes than low F items, making it more difficult to identify new high F Z 
items. 

A finding such as Grace’s which appears to locate the word frequency effect needs replicating, 
especially since (a) the original experiment has not been formally reported except by Murdock, 
and (b) there may have been a ceiling effect reducing the difference between the number of hits 
for high F and low F items. The probability of a hit for both groups exceeded 0-9. The 
experiment described below was therefore conducted to examine the distribution of hits and 
false positives to high F and low F items, when I value is controlled. Between the conducting of 
the experiment and the preparation of this paper two important studies of the word frequency 
effect were published, and, since these lead to different predictions to those from Grace’s 
experiment they will be described now. 

Gregg (1976) reviewed the research on the word frequency effect, without mentioning Grace’s 
experiment. He concluded that a combination of a multiple-component (attribute) model, and 
encoding variability differences provided the best explanation of the word frequency effect. From 
his model it is to be expected that there will be both fewer false positives to low F nouns, and 
more hits. He predicted fewer false positives to low F nouns because such nouns should have 
more distinctive lists of attributes than high F nouns which share more common attributes. This 
claim has also been made by others (e.g. Lockhart, Craik & Jacoby, 1976). He predicted more 
hits to low F nouns because low F nouns have fewer alternative meanings (Reder, Anderson & 
Bjork, 1974) and so are less likely than high F nouns to be encoded differently at the test from 
the initial encoding at the first presentation. High F nouns, with several alternative meanings, 
may be encoded differently at presentation and test, with the result that different sets of 
attributes are retrieved. If a new set of attributes is retrieved, they are unlikely to have been 
marked as having occurred at the presentation stage, and subjects will incorrectly report the item 
as new. Thus Gregg’s mode! predicts both more false positives and less hits for high F nouns. 

The second important publication is by Glanzer & Bowles (1976). They allowed subjects to 
work through a pack of cards in their own time. Each card had on it either a high or a low F 
word. Recognition was then tested by a forced choice procedure with the subjects choosing 
which of two presented items had been in the pack of cards. The most interesting conditions in 
the Glanzer & Bowles study involved subjects choosing between high F and low F nouns where 
either both or neither had been in the pack. They found that when the comparison was between 
new high F and low F items, subjects reported the high F noun as the old item on 67 per cent of 
the trials. When both items were old, the low F item was selected as the old item on 68 per cent 
of the trials. This suggests that, in the unusual situation of being forced to choose, when either 
neither, or both choices would be correct, subjects are more accurate at recognizing both old and 
new low F words than they are in recognizing high F words. 

To summarize, therefore, Murdock (1974) presented data which suggests that the frequency 
effect for word recognition results from more false positives occurring to high F nouns, a result 
similar to that found by Morris & Reid (1974) for I-value of nouns. Gregg (1976) proposed a 
model which predicts both more false positives to high F nouns, and fewer hits. Glanzer & 
Bowles (1976) gave data, in a rather different situation, which would seem to support Gregg 
rather than Murdock. The experiment described below examined the hit and false positive rate in 
a situation in which there was less opportunity for the ceiling effect that may have biased 
Grace’s experiment, and with slightly stricter control of study rates for the words than in the 
Glanzer and Bowles experiment. 
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Method 
Subjects 


The first year Psychology students at Lancaster University were group tested during a lecture period. Prior 
to scoring, any response sheets that were not fully completed according to instructions were eliminated. 
Eighty-four subjects met the criteria and their performance was analysed. 


Materials 


Fifty high frequercy (high F) and 50 low frequency (low F) words as measured by the Thorndike-Lorge 
(1944) word count were selected in the following way. The Paivio, Yuille & Madigan (1968) word list was 
subdivided into fire categories on the basis of ratings of I-value, namely 2-2-99, 3-3-99, 4-4-99, 5-5-99 and 
6-7. From each of these, equal numbers of high F and low F words were selected. All high F words were of 
AA frequency on the T-L count. Forty-five of the low F words had a frequency of one per million. The 
remaining five words had a frequency of less than one per million. The words selected to control for word 
length so that the mean length for high F words was 6-7 letters, and for low F words was 6-8 letters, with a 
range from five to nine letters, in both cases. 

The sets of high F and low F words were then each divided into two sets of 25 words, each set containing 
the same numbers of words from the I-value subdivision. One set of high F words was randomly mixed with 
a set of low F words, to form list A. The other two sets formed list B. 

Presentation lists were prepared on which each list was typed, with double spacing, in two columns, on 
A4 paper. There were 25 words to a column, with the list label at the top of the page. A response sheet was 
also prepared on which all 100 words were typed in a randomized order in four columns with a dotted line 
alongside each word for the subject’s response. 


Procedure 


The presentation I:sts A and B were distributed alternatively to the subjects, face downwards. The subjects 
were told that the list consisted of 50 words, in two columns. When instructed to do so, they were to turn 
over the page and read through the lists of words. They would be allowed an average of 2 sec to look at 
each word. They would be told when they should be half-way through the first list, when they should be 
starting the second list, when they should be half-way through the second list, and when to stop and turn 
over the page. After some time they would be tested on their recognition of the words. They were to note 
whether their list was labelled A or B. This procedure was followed, and afterwards the sheets were 
collected. 

After 20 min dur-ng which the lecture was continued, the response sheets were distributed. The subjects 
were told to label the sheet with the letter of their presentation list, and then to go through the words on the 
sheet putting a tick against every word from their original list and a cross against every new word. No 
specific time limit was given, but they were requested not to linger too long over items on which they were 
unsure. It was emphasized that a response must be made to every item. All the subjects completed the task 
in a few minutes, end the sheets were collected. 


Results 
For each subject, the number of hits and false positives were calculated for the high F and the 
low F words. The mean values are given in Table 1. 

Inspection of the table suggests that there were fewer hits and more false positives to high F 
than to low F wards, and this was confirmed by sign tests. Only eight subjects had more hits for 


Table 1. Means and standard deviations of the numbers of hits and false positives to the high 
frequency and low frequency words (maximum possible score in each condition = 25) 


Hits False positives 
High F Low F High F Low F 
Mean 12:2 17-0 70 43 


Standard deviations 38 3-2 38 28 
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high F words to low F words, while 72 had more hits for low F words (z = 7-04, P < 0-0001). 
Fifty-nine subjects had more false positives for high F words, 20 had more false positives for 
low F words (z= 4:28, P< 0-0001). 


Discussion 


There are more hits and fewer false positives to low F nouns. This supports Gregg's model, is 
compatible with the findings of Glanzer & Bowles, and fails to replicate Grace’s results, which 
may be presumed to have resulted from a ceiling effect. The similarity between Grace’s results 
and those of Morris & Reid appear to have been coincidental. 

A new question can now be posed. Can Gregg’s model also account for Morris & Reid’s 
findings, as well as those of the present experiment? The higher frequency of false positives to 
low I words is compatible with the theory since Morris & Reid demonstrated that low I words 
were more similar in meaning than are high I words. From Gregg’s model, the explanation for 
the lack of difference in hits for high and low I words is that high I and low I words do not differ 
in their encoding variability between presentation and test. Doubt is cast on this assumption by 
the findings of Galbraith & Underwood (1973) who found that subjects not only rated abstract 
words as more frequent than concrete words, when objective frequency was matched, but 
reported that abstract words occurred in more contexts than concrete words. If abstract words 
appear in more contexts it may be because they have more alternative meanings. This does not 
necessarily follow, however, since they may appear in more contexts but be used with the same 
meaning. This would not be surprising since Morris & Reid found that abstract words are more 
frequently synonymous with other abstract words than are concrete words with other concrete 
words. 

To test the possibility that high I and low I nouns differ in encoding variability a dictionary 
count of the number of meanings given to the high I and low I words used by Morris & Reid 
was undertaken. At the same time a count of the meanings of the high and low F words used in 
the experiment described above was made. One reason for the latter count was to estimate the 
comparative opportunities for encoding variability in the I-value and F-value experiment, but it 
also provided data which helps to distinguish between the models of the word frequency effect 
proposed by Gregg and by Glanzer & Bowles. 

Gregg’s model has already been described. Glanzer & Bowles postulate a model similar in 
some ways to Gregg’s. Both models share the explanation of differences in terms of changes in 
encoding variability and the inappropriate labelling of items as old because they are similar in 
meaning to items actually presented. However, Glazer & Bowles do not introduce the concept of 
attribute sets defining the items. Their model is simply in terms of the meanings attached to the 
words, At presentation a random subset of the total set of meanings of a word is marked. 
Associates of these words may also be accessed and marked. At the test a new random subset 
of the meanings of the items is sampled. More hits occur to low F words because they have a 
smaller set of meanings from which this sampling can take place, so that the meanings sampled 
are more likely to have been marked than are those for high F words. There are more false 
positives to high F words because they elicit more associations and therefore more words are 
likely to be incorrectly marked as old at the presentation stage. On the basis of their model and 
the results of their experiment they estimated the number of meanings available for sampling for 
their high F words as 8-6 and for low F words as 5-1. These seem high estimates. The dictionary 
count should give an approximate estimate of the meanings available to subjects. 


Dictionary count of word meanings 

There are several problems in selecting a dictionary for a count of word meanings. The 
dictionary should be modern, include slang and colloquial meanings and provide an adequate list 
of word meanings without including too many obscure or archaic usages. The Penguin English 
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Dictionary (Garmonsway, 1965) was chosen as best fulfilling those requirements. It has the 
advantage of having different usages clearly separated by semicolons, and specialist, slang and 
archaic usages :ndividually marked. The dictionary is comprehensive, and it is likely that several 
of the meanings listed for a word may not be known to the average subject. However, 
overestimates of the number of word meanings available in the subjects’ lexicon should not alter 
the conclusions drawn below. A count was made of the number of word usages listed for each 
word used in the experiment described earlier, and by Morris & Reid (1974). While all the usages 
differ, many are similar, so the term ‘usages’, rather than ‘meanings’ is more appropriate. 

Two counts were made. One designated ‘strict’ which excluded all specialist, slang and 
archaic usages, and one designated ‘loose’ which incorporated all listed meanings. The results 
are summarized in Table 2. 


Table 2. The results of the dictionary count of word usages for high and low word frequency and 
high and low I-ralue 





High F Low F High I Low I 
(n= 50) (n= 50) (n= 100) (n= 100) 
Percentage of items with more 98 44 B 80 
than one usage (loose count) 
Percentage of items with more 96 36 67 78 
than one usage (strict count) 
Median number of usages (loose 5-33 1-39 2:38 3-39 
count) 
Median number of usages (strict 4-00 1-28 2:05 2-79 
count) 


Perhaps the most important count is that of the items with more than one usage, since these 
have an opportunity to be encoded differently on the presentation and test trials. It will be seen 
from the table that using either the strict or the loose count, the high F words have far more 
opportunity than low F words for different encoding since almost every word has an alternative 
usage. For the high and low I words the difference is much smaller. The difference is largest for 
the strict count, zut even here it did not reach significance by a chi square test (x? = 2-01, 
d.f.=1, P>0-1). The difference for high and low F was highly significant, even though there 
were only half as many items involved (x? = 37-48, d.f. = 1, P< 0-001). 

The actual number of alternative meanings available is not particularly important to the Gregg 
model, although -t is important for the Glanzer & Bowles model. It can be seen from the table 
that (a) the difference between the medians is far higher for frequency than it is for I value, and 
(b) that the numters involved are considerably smaller than those estimated by Glanzer & 
Bowles. 

First, the adequacy of the Gregg model to explain the similarity in hits for high I and low I 
words. It is clear from the usage count that differences that may exist between the encoding 
alternatives for high and low I words are far smaller than those between high and low F words. 
The opportunities for high I and low I words to change encoding between trials are roughly 
equal. There does appear to be a small difference in the average number of meanings for each 
word, However, zontrary to the assumptions of Glanzer & Bowles, the present author would 
argue that encodings tend to remain the same unless context demands an alternative 
interpretation. There are probably one, or two, common encodings of any word, and the word is 
encoded in this way unless the situation suggests otherwise. Evidence for this view comes from 
the conformity of subjects in their responses in word association tasks. Also, Morris & Reid 
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(1973) found that subjects report forming the same image to a word on its second presentation 
on three-quarters or more of the trials. If words were frequently encoded differently this 
recurrence of the same image would not take place. Encoding variability will occur, and is part 
of the explanation of recognition differences offered in this paper. However, the selection of the 
particular meaning to be encoded is not likely to be the random process envisaged by Glanzer & 
Bowles. 

If high I and low I words do not differ greatly in encoding variability, then Gregg’s model 
predicts no differences in hit rate, as was found. 

Secondly, there is considerable discrepancy between the estimate of the number of word 
meanings available to subjects, especially for low F words, between the present study and that 
of Glanzer & Bowles. The difference cannot be ascribed to differences in the word samples, for, 
although the T-L count was used for the pr¢sent experiment and the Kucera & Francis (1967) 
norms by Glanzer & Bowles, the frequency of the items in the different counts was roughly 
similar. Low F words cannot average 5+meanings each as Glanzer & Bowles suggest. 

Their model can be questioned on several counts. First, they assume that at acquisition a 
random set of the total set of meanings of a word is marked and another random set drawn at 
the test. This is a convenient assumption for mathematical modelling, but is unlikely to bear 
much resemblance to psychological reality. The work of Bransford & McCarrell (1974) illustrates 
the importance of context to encoding. Where there is little context, it is more likely, as argued 
earlier, that the same encoding will take place. Nor is there evidence for more than one meaning 
being encoded at one time, and much has been made of the problems raised for the memory 
system by encoding specificity (e.g. Bower, 1970; Tulving & Thomson, 1973). 

The advantage of Gregg’s model is that it takes into account both the existence of different 
meanings, and of attribute sets defining those meanings. This model can account for the 
observed differences in hits and false positives when I-value and frequency are manipulated, 
while being compatible with the properties of the words revealed by an examination of their 
dictionary definitions. High F and low F words appear to differ both in their associated meanings 
(high F having many, low F having few), and in the distinctiveness of the attributes defining these 
meanings (high F nouns sharing more common attributes). High I and low I words differ little in 
the number of meanings each word has, but do differ in the distinctiveness of their attributes, 
with low I words having more attributes common to other words. 

It is possible, using Gregg’s model, to offer an explanation of Galbraith & Underwood’s (1973) 
finding that abstract words are reported as being more frequent than are concrete words. It has 
been argued here that abstract (low I) words share more common attributes than do concrete 
(high I) words. When abstract and concrete nouns are selected on the basis of equated objective 
frequency, the individual attributes of the abstract nouns will usually have occurred more 
frequently than those of the concrete nouns. This is so because other abstract nouns which share 
the attributes will have been encountered more frequently than will concrete nouns which share 
the attributes of the concrete nouns. The words will be equated for objective frequency, but the 
attributes of the abstract words will have been more frequently encountered. If judgements of 
frequency are made on the basis of the frequency with which the attributes have been encoded, 
then abstract words will be judged more frequent than concrete words. 

The finding of Ghatala & Levin (1976) that when concrete and abstract words are equated for 
their phenomenal frequency rather than their objective frequency, the usual advantage of 
concrete words in verbal discrimination learning disappears, can be interpreted in the light of the 
model suggested above. When the words are equated on the basis of their frequency, as judged 
by the subjects, then, according to the present model, they are being equated in the past 
experience of their attributes. If abstract nouns are selected so that the experienced frequency 
of their attributes is equal to that of the appropriate concrete nouns, the selection must have 
been such that abstract and concrete words with attributes equally common and distinctive, have 
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been chosen. Tre usual advantage of distinctive attributes to aid verbal discrimination is now 
lost to the concrete nouns, and verbal discrimination learning is equated. 

Gregg’s model, therefore, not only accounts for the difference between the recognition of high 
F and low F nouns, but also for the recognition of high I and low I nouns, it predicts the 
different loci of failures in recognition for these dimensions, it is compatible with the evidence 
from dictionary definitions, and it can provide an explanation for the higher apparent frequency 
of abstract noues. Finally, as stated in the introduction, it has the advantage of parsimony, 
since attribute models have accounted for so many of the phenomena of memory. 
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Spatial errors made by infants: Inadequate spatial cues or evidence of 
egocentrism? 


J. Gavin Bremn2r 





Nine month old infants search correctly for an object which they have seen hidden in one position, but cease 
to do so after they have been moved to the opposite side of the display, searching instead at a position 
which is apparent y defined egocentrically from their experience before movement. This error can be 
explained on the one hand in terms of response dependence or egocentrism, or on the other hand as due to a 
lack of adequate spatial cues to allocentric position. In order to distinguish between these hypotheses, 64 
nine month old intants were presented with a hidden object problem in which the two alternative positions 
had covers of different colours. The results show that infants could search correctly for an object in one 
location although -hey saw the problem from different sides. This result combines with those of other 
conditions to indicate that cover colour provides an adequate spatial cue, allowing the infant to specify 
position allocentrically, provided the correct cover maintains a stable position. 


In Piaget’s account of sensorimotor development, stage IV marks a crucial point in the 
development of the concept of the object, since for the first time the infant searches for an 
object which is out of view (Piaget, 1954). This new found ability is somewhat limited, however, 
since during stage IV the infant typically searches successfully for an object concealed in one 
position (A) but continues to search there after he sees the object hidden at a new position (B). 
This phenomenon, known as the stage IV, or AB (A not B), error has been replicated in many 
recent studies (Appel, 1971; Gratch & Landers, 1971; Bower & Paterson, 1972; Evans & Gratch, 
1972; Harris, 1973; Gratch et al. 1974), and is thus a reliable finding. 

Piaget takes tke stage IV error as indicative of the limited nature of the infant's object 
conception. Although the infant of this stage has made the important step of engaging in manual 
search for a hidcen object, the object is not yet located in a coherent spatial framework, and to 
this extent, remains ‘.. .at disposal in the place where the action has made use of it’ (Piaget, 
1954, p. 50). He suggests that the infant very rapidly associates the object with a particular 
place, and that he records this place solely in terms of the actual movements which he made 
when he originaly retrieved it from its first position. This dependence on past actions leads him 
to repeat the same movement even when the object is hidden in a new place. This hypothesis 
views the infant’s behaviour as egocentric on two levels: (1) there is egocentrism in the sense 
that he still considers the object ‘at disposal’, and as a consequence, (2) the location of search is 
spatially egocentric since it depends upon the infant’s position. The latter part of the argument is 
weaker in form, 3ince although it states that the infant’s definition of the object’s position 
depends upon his own position and orientation, it allows the possibility of the infant having a 
conception of objects with an existence independent of his actions. This is the form of the 
argument with which the present paper is concerned. 

Until recently, no experimental evidence has been brought to bear on the question of whether 
the position to which the infant erroneously returns is defined by him with reference to an 
egocentric or an allocentric spatial framework. This is the case because in the standard test for 
the stage IV erroz, position ‘A’ remains, both egocentrically and allocentrically, the ‘same’ place 
throughout the experiment. In a previous experiment (Bremner & Bryant, 1977), we provided 
a means of answering this question. The design of this experiment was based on a much earlier 
one (Tolman, Ritchie & Kalish, 1946) intended as a solution to a similar problem in animal 
learning. By mov2ment of the infant half-way through the test session, this design allowed the 
relative importance of the egocentric and allocentric codes to be assessed. 
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In our previous study, infants showed a strong tendency to reach in the same direction relative 
to themselves, and consequently to the same egocentric position as on initial trials. This 
happened despite changes in the spatial aspects of the task, such as hiding the object at the 
other position (standard stage IV task), or hiding the object at the same allocentric position with 
the infant relocated at the opposite side of the table. In the latter case, due to the symmetry of 
the two positions about the infant’s midline, his movement around the table reversed the position 
defined egocentrically on initial trials from ‘A’ to ‘B’. This manipulation left any allocentric 
definition of position unchanged, and the fact that infants erred consistently in this case by 
searching at position ‘B’ lends strong support to the egocentric hypothesis. 

The strength of the egocentric tendency was emphasized in this study by the fact that the two 
positions were clearly differentiated, the table being painted so that one position lay on a black 
background and the other lay on a white background. Despite this fact, infants in the new 
position repeated the egocentric response to a distinctly different side after seeing the object 
hidden in the old position. The fact that this result was the opposite of that obtained by Tolman 
et al. (1946) deserves consideration. They showed that rats found place learning considerably 
easier than response learning (turning in one direction from different starting points). It seems 
likely that this was due to the fact that the rats were provided with salient cues to allocentric 
position, such as distinctive lighting or olfactory cues (Ballachey & Buel, 1934). It might well be 
that in our previous study, distinguishing the table sides did not provide a cue to allocentric 
position which was salient for the nine month old infant. 

The present experiment was designed as a test of the egocentric hypothesis. If it could be 
shown that, given salient spatial cues, the nine month old infant has the ability to search at a 
position bearing an invariant relation to these cues, but with a varying egocentric position, then 
we would have strong evidence that his organization of space is not necessarily egocentric. 
Probably the most salient way to cue object location would be to differentiate the covers under 
which the object is to be concealed, since these are the intermediary objects to which the infant 
directs his action during search. Such an assumption is in line with findings in the literature on 
spatial contiguity in discrimination learning (Cowey, 1968). Accordingly, the present design 
follows that of our previous experiment closely, while using differentiated object covers instead 
of differentiated table sides as spatial cues. 


Method 
Subjects 


64 healthy babies (mean age 9 months 9 days, range 31 days) completed the experiment. A further 17 babies 
(mean age 9 months 7 days) were seen but were not considered, either because they were upset at the time, 
or because they showed no interest in the experiment. The 64 babies were divided into four groups of 16 
(Groups A, B, C and D). Of the 16 babies in each group, eight were male and eight were female. 


Apparatus 


The baby was seated in a baby chair facing a mid-grey table, 30x60 cm, and 38cm high. Two 8x8 cm wells, 
3 cm deep, and with 14 cm rims, were located with their centres 28 cm apart in the long axis midline of the 
table. The table was oriented so that its long axis was perpendicular to the direction of the infant’s ‘straight 
ahead’. Two 16x16 cm dish cloths, one off-white and the other black, were used as covers for the wells. (An 
off-white cover was used because pure white stood out too strongly.) Distinct covers were used in order to 
test the salience of this distinction as a cue to object location. The baby chair was mounted on a double 
ended framework which allowed its accurate placing at either side of the table, and allowed the experimenter 
to move the baby towards and away from the table, from his position opposite the baby. 


Design 
Following a brief familiarization period, the experiment consisted of two major stages, each consisting of 
five hiding trials. For all four groups the procedure was the same in stage I: the object was hidden five times 


in the same place (A). Stage II, which followed immediately, always involved some change in the spatial 
aspects of the task (see Table 1). 
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The infants in group A were moved around the table to its opposite side between stages I and II. For 
them, stage II trials involved the object being hidden at the same position (A) on the table, and under the 
same cover as in stage I trials. The infants in group B were similarly moved round the table, but for them 
stage II trials involved the object being hidden at the other position (B) on the table, and under the other 
cover, relative tc those of stage I trials. Total reliance on an egocentric definition of position precludes 
successful performance on stage II trials for group A, since the hiding position (A) ceases to be the one 
specified by this system after the infant is rotated. For group B, however, the different table position (B) of 
stage II trials corresponds to the position defined egocentrically on stage I trials, due to the rotation of the 
infant. Hence, reliance on an egocentric strategy would result in success in stage II for group B. 

Success in stage II trials for group A might be taken as indication that the infant possesses the ability to 
code position allccentrically. However, this condition does not distinguish between (a) a response resulting 
from the ability to locate the hidden object in an allocentric framework, and (b) a response established to a 
particular cover icrespective of whether that cover invariantly cues the object's position. Group B was 
included as an atrempt to distinguish between these alternatives. Correct response by this group would 
preclude the latter possibility, since it involves response to a different cover. 

Infants in groups C and D were not moved between stages I and II. For group C the covers’ locations 
were exchanged between stages I and II, while for group D they remained in the same locations throughout 
both stages. For both groups, during stage II the object was hidden at the side of the table opposite to that 
used in stage I (side B). This meant that for group C, in stage II the object was hidden under the same cover, 
at the opposite side of the table, while for group D, the object was hidden under the other cover at the 
opposite side of tne table. Group D constitutes a control, since but for the use of perceptually distinct 
covers, they were given the standard problem for testing the stage IV error. Group C was included so that 
the salience of cover colour as an indicator of the hidden object's location could be assessed. 

Table 1 describs the difference between stages I and II in terms of whether, (a) the object’s egocentric 
position, (b) the cover under which the object was hidden, and (c) the room frame position of the object, 
was the same (S) throughout both conditions, or differed between conditions (D). Since the table remained 
fixed throughout the experiment, table side and room frame position coincided in this experiment. 

Since the procedure adopted in this study was the same as that in our previous experiment (Bremner & 
Bryant, 1977), the data from the present groups A, B, C and D can be directly compared with the data of 
groups A, B, C ard E of that experiment (see Table 1). 


Table 1. Design of the experiment (CC) showing its relation to the relevant groups of Bremner & 
Bryant’s experiment (TSC) 











Stage II 
Stage I 
Correct Correct Correct Correct 
No. of Inter-stage No. of egocentric cover table side room frame 
Group trials change trials position colour colour position 
Experiment ‘CC’ 
A 5 Child rotated 5 D S — S 
B 5 Child rotated 5 S D — D 
C 5 Covers reversed 5 D S — D 
D 5 Wone 5 D D — D 
Experiment ‘TSC’ 
A 5 Child rotated 5 D — S S 
B 5 Child rotated 5 S — D D 
C 5 Table rotated 5 D — S D 
E 5 None 5 D — D D 





D = different from stage I. 
S = same as stage I. 
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Procedure 


Familiarization period. Infants were placed in the chair and allowed to familiarize themselves with the 
experimenter and novel objects such as the chair, table, and covers. It was judged important that the 

infant should become habituated to the covers to ensure that lifting them in the experiment proper was not a 
result of interest in the covers per se, a possibility suggested by Bower & Wishart (1972). Once the infant 
seemed happy in the situation, a novel toy was introduced (e.g. a padlock and chain, or a toy ladybird, ete.: 
the one which interested the infant was found by trial and error). 

Once the infant’s attention was captured by the toy, two ‘warm up’ trials were given. These took the form 
of partial hiding trials, and were given because it has been shown that such experience increases the 
likelihood of search for a hidden object on later trials (Miller, Cohen & Hill, 1970). The object was lowered 
into the well in which it was to be hidden later in stage I, and then was lifted out again. The lowering 
sequence was repeated three times, and the third time the object was left in the well with part of it 
protruding. Both covers were then drawn forward simultaneously to the same point on each well, leaving the 
object roughly three-quarters covered. During this time the infant was seated out of reach of the table. After 
a 3 sec delay, he was moved forward to the table and allowed to search for the toy. He was allowed to 
correct mistakes, and to play with the toy for about 10 sec when he retrieved it. 


Stage I. The five stage I trials followed immediately. The same procedure was followed in these as in the 
warm up trials, except that in this case the object was completely covered. The toy was hidden with the 
baby looking on. Again, 3 sec were allowed to elapse between hiding and search, and when the infant found 
the object he was allowed to play with it for about 10 sec. On each trial the experimenter covered both wells 
with their cloths. The location of the correct side (left/right) and the correct cover colour (black/white) was 
completely counterbalanced within each group. 


Stage IT. Here the trial administration procedure was identical to that of stage I except that the infant was 
drawn back from the table immediately on making an error, and so was prevented from correcting his errors. 
However, the hiding location varied between groups as described in the design section. Between stages I and 
TI the infants in groups A and B were carefully carried round to the opposite side of the table, every attempt 
being made to keep their attention on the table. Infants in groups C and D remained stationary between 
stages I and I, and in the case of group C, the experimenter reversed the positions of the covers while the 
infant looked on. i 

In both stages, and including the transition interval between stages I and II, the inter-trial interval was held 
as near 15 sec as possible. 


Results 
Stage I 


The scores in stage I show that, on the whole, infants were perfectly well able to search 
correctly for the object at its initial location. Table 2 shows that on each trial at least 12 of the 
16 infants in each group searched for the toy at the correct location (binomial P = 0-038), 


Table 2. Number of infants (out of 16) making errors for each trial 





Stage I trials Stage II trials 
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Stage IT 


The crucial trial of this stage is the first one, since the infant’s performance on subsequent trials 
may be affected by his previous stage II experience as well as by his experience in stage I. The 
scores for the first trial are presented in Table 2. Groups A and B stand out for their low error 
rate. A greater number of infants than expected by chance were correct in both groups (binomial 
probability: group A, P= 0-038; group B, P=0-001). Pairwise x? comparisons show that there 
was no significant difference between groups A and B (x*[1] = 0-95, P> 0-3), but that both these 
groups made significantly fewer errors than control group D (groups A vs. D, y*[1] = 4-5, 
P<0-05; groups B vs. D, x*[1] =9-35, P< 0-01). Additionally, group B made significantly fewer 
errors than grop C (,?{1] = 4-16, P< 0-05). 

The implications of these results become clearer when they are compared with the 
corresponding results in our previous experiment (Bremner & Bryant, 1977). Any difference in 
scores between equivalent conditions of these two experiments would indicate a difference in the 
effect of cover colour versus table side colour as spatial cues, since identical procedures were 
adopted in all respects other than the type of position cue used. Since the intended comparison 
is between the effects of cover colour versus table side colour, the present experiment and our 
previous one shall henceforth be referred to as experiments ‘CC’ and ‘TSC’ respectively. 

As is shown in Table 3, the main difference between experiments appears in the scores for the 
two A groups. Performance in experiment ‘CC’ clearly surpassed that recorded in experiment 
‘TSC’ (¥7[1] = 4-5, P< 0-05). This difference may be taken as indication that, for this condition 
at any rate, cover colour constituted a more salient spatial cue than table side colour, to the 
extent that the majority of infants in group A of the present experiment searched successfully 
despite the fact zhat successful search involved abandoning search at the place specified 
egocentrically from stage I trials. x? comparison of the two C groups shows that although 
performance in experiment ‘CC’ was better than in experiment ‘TSC’, this difference was not 
significant (y*[1] = 1-14, P> 0-2). In addition, group C performance in experiment ‘CC’ did not 
depart significantly from chance (binomial P= 0-4). Thus, only in the condition where the 
allocentric position of the correct cover remained invariant between stages I and II, and the 
infant’s location was changed, did cover colour constitute a sufficient cue to object location for 
the infant to change the direction of his response and go to the correct place. 

In experiment ‘CC’ the mean error rate over all five stage II trials follows a similar pattern to 
the first trial data (see Table 2). The differences between groups however, were not so marked, 
due to a tendency toward chance performance as the stage progressed. A four-way analysis of 
variance (group x sex Xtrial side xcover colour) performed on the error scores over the five trials 
for groups A, B, C and D of experiment ‘CC’ and groups A, B, C and E of experiment ‘TSC’ 
showed a significant difference between groups (F= 3-6, d.f. =7, 64, P< 0-01). A Newman- 
Keuls range test indicated that only two group differences occurred between experiments. Both 
group A and group B of experiment ‘CC’ performed significantly better than control group E of 
experiment ‘TSC’ (group A, P< 0-05; group B, P< 0-01). The group A result suggests a 
difference between experiments on this measure, since within experiment ‘TSC’, group A did 
not differ significantly from control group E (P> 0-05). This suggests that the difference between 
the two A groups shown in the first trial data was also present in the data for the whole stage. 
This result is not clear cut however, since the difference between A groups over the whole stage 
was not significart. 

The analysis of variance showed no significant sex differences, and no effects of side or colour 
of stage II hiding. nor were there any significant interactions between factors. 
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Table 3. Number of infants (out of 16) making errors in stage II, and mean error for stage II, for 
experiment ‘CC’ and relevant conditions of experiment ‘TSC’ 











Experiment ‘CC’ Experiment ‘TSC’ 
Correct Correct 

Correct Correct room Ist Correct Correct room lst 

egocentric cover frame trial Mean egocentric table side frame trial Mean 
Group position colour position errors error Group position colour position errors error 
A D S S 4 1:3 A D S S 11 25 
B S D D 1 1-0 B S D D 4 1:2 
C D S D 7 2-0 C D S D 11 2:9 
D D D D 11 2-5 E D D D 


12 3-4 


D = different from stage I. 
S = same as stage I. 


Discussion 

This experiment provides striking evidence that the stage IV infant’s search for objects, or more 
generally his construction of space, need not be egocentrically defined. Although there is some 
evidence that the stage IV error is not response dependent (Evans, 1973; Butterworth, 1974), no 
study has produced evidence that the infant can use allocentric cues to the extent that they 
override the effect of egocentric tendencies, whether or not these are response dependent. 

The fact that group A of the present experiment performed essentially correctly on stage II of 
the task, while the corresponding group in our previous experiment erred consistently, shows 
that cover colour was a considerably more salient cue to allocentric position than table side 
colour. That this turned out to be the case is not surprising in the light of evidence that spatial 
contiguity of stimulus, response, and reward is an important factor in determining the speed with 
which discrimination learning is achieved in animals (Cowey, 1968). 

It seems safe to conclude that, given adequate position cues, the infant has the ability to 
search at one allocentric position consistently, despite the fact that due to his own movement 
this is no longer the position specified egocentrically. This is clearly an important ability for the 
infant to possess as he becomes mobile, since otherwise, direct return from different positions in 
a room, to a starting point or to an interesting object would be impossible when these are not 
directly visible. While the infant is relatively immobile, reliance on an egocentric definition of 
positions represents an adequate strategy. Thus it would come as no surprise to find that he 
relies on such a strategy, and only resorts to an allocentric strategy as his increasing mobility 
creates a growing number of anomalies in his world. It seems likely that initially the infant only 
uses an allocentric strategy when presented with particularly salient allocentric cues. Such cues 
are all-important in an allocentric specification of space, since when all cues are eliminated even 
the adult is forced to rely again on egocentric specification combined to a certain extent with 
memory of his displacements and reorientations. Additionally, it would make sense for the infant 
to rely on an allocentric strategy when he has been moved, since it is in these situations that he 
is likely to experience failure of the egocentric strategy. This would explain the superior 
performance of group A over group C, since only infants in group A were moved. Not only 
would the infant’s increasing experience gained from his movements lead him to rely on 
allocentric strategies when he had been moved, but this same experience might well lead to the 
ability to take into account his own displacements better than those of the object and its cues. z 

This hypothesis, namely that the infant comes to rely more on allocentric strategies due to his 
growing lack of success with the egocentric strategy as he becomes more mobile, has a number 
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of advantages zs far as explanation of the present data is concerned. The fact that group A 
performed better than group C on stage II of the task can be explained in two different ways. 
Either, it might be that the infant’s movement between stages I and II in group A alerted him to 
a change in the situation, or it might be that the allocentric cue was used to guide search only 
when it remained stable, as was the case for group A alone. When this problem is viewed in the 
light of the present hypothesis, however, these alternatives become coordinated within a single 
explanation. According to this, the infant’s movement prompts him to use an allocentric 
strategy, and also, the assumption that he becomes better able to deal with his own 
displacements than with those of objects predicts that he should be better able to utilise stable 
spatial cues than unstable ones in applying an allocentric strategy. 

Although this experiment has shown that the nine month old’s organization of space need not 
be egocentric, we are left with the fact that in many cases it is. It remains to be seen to what 
extent the infamt’s perseverative errors in the stage IV task are simply due to a response habit, 
or, at a higher level, to an egocentric understanding of space. If, as seems likely from recent 
work (Evans, 1373; Butterworth, 1974), the latter explanation proves appropriate, then it would 
still be necessary to determine to what extent such an understanding relies upon active 
experience for its genesis. 

As has been suggested elsewhere (Harris, 1975), it seems likely that the nine month old infant 
has a number of strategies at his disposal, some more readily so than others, and further, that he 
may use the one which is most appropriate to the situation as he sees it. When his perception of 
the situation is >olstered by adequate cues, he may be capable of performing at or near the adult 


level, while in their absence he is reduced to reliance on strategies which do not always yield 
success. As he becomes increasingly mobile, his growing lack of success with the egocentric 
strategy should lead to an increasing reliance on allocentric strategies in appropriate situations, 
and consequently, to a broadening of the range of adequate spatial cues. 
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Cognitive style and lateral eye movements 


Ming-shiunn Haang and Brian Byrne 





The lateral eye movement paradigm was employed to test the hypothesis that narrow categorizers, who are 
believed to be mcre analytic in information processing, make characteristic use of the left hemisphere, while 
broad categorizers, being more holistic, depend more on the right hemisphere. Data for narrow categorizers 
confirmed the hypothesis in that they tended to produce a majority of right shifts. Results from broad 
categorizers turned out to be less clear, although there was a tendency in the direction of left eye shift. 
Additionally, narrow categorizers made significantly more total LEMs than did broad categorizers. The 
results are discussed in terms of cerebral specialization underlying cognitive style. 





In accounting for the functional asymmetry of the brain, clinical (Corkin, 1965; Milner, 1967; 
Bogen, 1969; Sperry, Gazzaniga & Bogen, 1969; Gazzaniga, 1970) and experimental (Geffen, 
Bradshaw & Wallace, 1971; Klatzy & Atkinson, 1971; Rizzolatti, Umilta & Berlucchi, 1971; 
Cohen, 1973; K.ng & Kimura, 1972; Bever & Chiarello, 1974) evidence supports a double- 
dominance modz2I of hemispheric organization, first proposed by Jackson (Taylor, 1932) and 
later revised and enlarged by Bakan (1971). In this model it is assumed that each hemisphere is 
dominant in diffèrent functions, specialized for different modes of information processing. The 
left hemisphere is organized for verbal-analytic processing, the right hemisphere for 
preverbal-holistic activity. 

Recent research on lateral eye movements (LEMs) provides another source of experimental 
evidence relevant to this model. It has been proposed by Bakan (1969, 1971) that the direction of 
LEMs provides a reliable index of which hemisphere is active in information processing. 
Specifically, rigt-t LEM signifies involvement of the left hemisphere, while left LEM is related to 
the activation of the right hemisphere. Experiments primarily designed to test this hypothesis 
employed one of the following two techniques: (a) within subjects, where the directions of 
LEMs to questions demanding different cognitive modes are observed and in which it has been 
found that verbel and numerical questions, which are generally seen as invoking left hemisphere 
activity, result im more right LEMs (Kocel, Galin, Ornstein & Merrin, 1972; Weiten & Etaugh, 
1974; Gur, Gur & Harris, 1975); (b) between subjects, in which an attempt is made to classify 
groups of subjects having different personality or cognitive characteristics as typically right or 
left movers. It seems that greater susceptibility to hypnosis and better perceptual-motor skills 
are found amongst left-movers (Bakan, 1969; Weiten & Etaugh, 1973; Gur, Gur & Harris, 1974), 
while higher verbal and quantitative abilities, better visual attention and faster reading speed, 
characteristically left hemisphere functions, are mainly found in right-movers (Bakan & 
Shortland, 1969; Weiten & Etaugh, 1973). In the present study we attempt to correlate a 
cognitive style variable, category width, with LEM in order to test if narrow and broad 
categorizers can be classified as right and left movers respectively and hence can be said to 
make characteristic use of the left and right hemisphere. 

Following Lefi, Gordon & Ferguson’s (1974) definition of cognitive sets, cognitive style can be 
defined as an ‘in-built plan or programme to select specific types of data for processing or to 
perform specific mental operations on information processed’. For the purpose of this paper, 
discussion will b2 limited to ‘category width’, measured by Pettigrew’s (1958) C-W Scale, with 
broad and narrow categorizers as the extremes. Considerable research has contributed to a 
general understanding of the nature of the in-built programmes employed by narrow and broad 
categorizers, with experimental evidence suggesting a difference in information processing 
strategies. Narrow categorizers are consistently better at tasks requiring detailed or analytic 


86 Ming-shiunn Huang and Brian Byrne 


processing, whereas broad categorizers are more efficient when a more integrated or holistic 
strategy is demanded. For instance, it has been found that narrow categorizers were better at 
discriminating photographed faces which were shown earlier in the experiment from faces new to 
the experiment (Messick & Damarin, 1964) suggesting a greater attention to detail. It seems that 
broad categorizers, preferring a greater variety of experience than the narrow categorizers 
(Taylor & Levitt, 1967), are able to process greater diversity of available stimulus input in a 
multi-attribute paired association learning task and are able to recall a significantly larger number 
of attributes than the narrow categorizers (Parsons, 1973). It therefore seems plausible to assume 
that narrow categorizers employ an analytic type of in-built programme while broad categorizers 
adopt a holistic approach to information processing. 

It appeared to us worthwhile, therefore, to test the following deduction; narrow categorizers, 
being more analytic in information processing would make characteristic use of the left 
hemisphere; broad categorizers, more holistic, would depend more on the right hemisphere. To 
test this hypothesis we employed the lateral eye movement paradigm. 


Method 
Subjects 


Pettigrew’s C-W Scale was administered to about 150 first-year psychology students. Subjects obtaining 
scores above 90 (n= 18) and below 47 (n= 15) were selected as broad and narrow categorizers respectively. 
These cut-off points were chosen because they have been used consistently for a number of our other 
experiments, which we plan to report later. It happened that all the 18 narrow categorizers and 13 out of 15 
broad categorizers were females. Since sex has been found as a confounding variable in LEM experiments 
(Bakan, 1971; Gur & Gur, 1974; Weiten & Etaugh, 1974), the two male broad categorizers were excluded. 


Stimulus questions 


Forty stimulus questions were used. There were 22 factual questions (six numerical, six spatial, ten verbal) 
and 18 reflective questions. ‘Multiply 12 by 13’ is an example of a numerical question; ‘How many parts do 
three diameters divide a circle into?’ was used as a spatial question; ‘In what way are praise and punishment 
alike?’ as a verbal question; and ‘What would you think is the future of Vietnamese orphans adopted by 
Australian families’ was a reflective question. Although the primary interest centres on identifying 
differences among subjects (narrow vs. broad categorizers), and hence the choice of a procedure likely to 
suppress the effect of question type (see below), we included these well-studied question groups to allow for 
analysis of some unanticipated effect, simple or interactive, of question type and cognitive style. 


Procedure 


The subjects were seated in a comfortable chair in a small room which was free of distracting stimuli. The 
experimenter (M.-S.H.), who had no knowledge about the subject’s scores on the C-W Scale, was seated 
facing the subject at a distance of about 1-5 m. This procedure was used because it has been found that 
subjects move their eyes predominantly in one direction, either right or left, regardless of problem type, 
when experimenter and subject are facing each other (Gur et al. 1975). Subjects were asked to think about 
the answer to each question before responding. Their initial direction of lateral eye movement following the 
completion of each question was recorded. Three categorizations were used, ‘right’, ‘left’ or ‘unclassifiable’ 
A response was recorded as ‘right’ or ‘left’ only when subjects shifted their gaze away from the 
experimenter in one direction at the end of the question reading, and held their gaze in that direction prior to 
responding. A response was recorded as ‘unclassifiable’ if the subject was not sharing gaze with the 
experimenter at the end of the question, rapidly shifted her gaze randomly, looked upwards or downwards 
or failed to shift her gaze at all. After all the questions were finished, subjects were asked to fill in the 
Edinburgh Handedness Inventory (Byrne, 1974). 


Results 


The data are presented in Table 1. Following Kocel et al.’s convention (1972) three subjects who 
made less than 20 classifiable (right or left) LEMs were excluded from the analysis. Two were 
narrow Categorizers and one a broad categorizer. Results on the Edinburgh Handedness 
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Inventory showed that all but one subject were strongly right-handed (see Byrne, 1974, for 
definition of handedness categories). This single subject, a left hander and a broad categorizer, 
was excluded from the analysis. Therefore, altogether there were 16 narrow categorizers and 11 
broad categorizers, all females, in the data analysis. 


Table 1. Lateral eye movements (LEMs) 


Narrow categorizes Broad categorizers 
S Total Right Right S Total Right Right 
number LEM LEM LEM (%) number LEM LEM LEM (%) 
S, 35 16 45:7 S, 36 24 66-6 
8; 27 26 96:3 S, 24 6 25 
S; 39 37 94.8 S; 21 4 19-1 
Sy 32 24 75 S, 29 9 311 
Ss 38 33 86-8 Ss 29 0 0 
Ss 33 32 96-9 Se 24 9 37-5 
S, 36 24 66-6 Ss; 23 13 56-5 
S: 34 30 88-2 Ss 26 2 77 
Ss 33 11 33-3 S 30 19 63-3 
Sio 23 22 95-7 Sio 35 35 100 
Su 39 8 20-5 Mean = 27:4 12-5 43-16 
Sis 34 26 76-4 
Su 32 26 81-2 
Sus 34 20 58-8 
Sis 32 31 96-8 

Mean = 32-9 23-8 73-3 


To test the hypothesis that a greater number of narrow categorizers can be classified as 
right-movers and broad categorizers as left-movers, we looked firstly at the data on subjects who 
made 75 per cent or above left or right LEMs, classified as either left or right-movers 
respectively. (TLis cut-off point was used by Weiten & Etaugh (1974).) Amongst the 16 narrow 
categorizers, ten could be classified as right-movers, one as a left-mover, while four left-movers 
and one right-mover were found in broad categorizers. The Fisher exact probability test yields a 
value of P< 0-025 (two-tailed) of getting this pattern under the null hypothesis. 

As a further aoproach to the analysis of the main hypothesis, the mean percentages of right 
LEMs for the nerrow categorizers (73-4 per cent) and broad categorizers (43-2 per cent) were 
each compared egainst a null hypothesis value of 50 per cent. The result was significant for the 
narrow Categorizers (t = 3-966, d.f. = 15, P<0-01, two-tailed), but not for the broad categorizers 
(t =~—0-747, d.f. = 10). (As the means above would suggest broad and narrow categorizers are 
distinguishable fom each other on proportion of right LEMs; t = 2-89, d.f. =25, P<0-01).) 

Quite independently of the above analysis on the direction of LEMs, the means of total 
number of LEMs for the narrow categorizers (32-9) and broad categorizers (27-4) were compared 
and found to differ significantly (t= 2-75, d.f. = 25, P< 0-025, two-tailed). 

Given the two results that narrow categorizers were characteristically right-movers, and that 
they made significantly more total LEMs, it would be expected that percentage right LEMs were 
significantly related to total number of LEMs in these data. Pearson’s product-moment 
correlation coeff cient was calculated using all the 27 subjects and found to confirm this 
prediction (r= 0 564, P< 0-01, two-tailed). : 

We also analysed the effect of question type on LEMs. We examined the direction of 1" ft 
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subjects’ eye movements to verbal/numerical (left hemisphere function) and spatial (right 
hemisphere) questions separately. Narrow categorizers as a group consistently made more right 
LEMs than left to both verbal/numerical (71-4 per cent) and spatial (77-4 per cent) questions. 
The picture for broad categorizers is different: As a group they produced 36-8 per cent right 
LEMs to verbal/numerical questions and 57-0 per cent right movements to spatial questions. 

To test if individuals within each group tended to process both types of questions consistently 
in one hemisphere, we calculated the correlation between each subject’s right LEMs to 
verbal/numerical questions and to spatial questions. The resulting correlations were 0-72 
(P< 0-01) and 0-64 (P< 0-05) for narrow and broad categorizers respectively. 


Discussion 

The data on the narrow categorizers confirmed our hypothesis; they are characteristically 
right-movers. Of the 11 who could be classified as consistent movers (> 75 per cent LEMs in 
one direction), ten moved to the right. Narrow categorizers as a group made a significantly 
greater percentage of LEMs to the right (73-4). In addition, they had a higher total of LEMs 
than the broad categorizers. 

However, the situation for broad categorizers is less straightforward. As a group, the direction 
of their LEMs was not significantly different from a chance 50 per cent (percentage left 
LEMs = 56-8 per cent); although four out of the five classifiable ones were left-movers, the 
picture for the majority of broad categorizers was less clear. Furthermore, they made 
significantly less total LEMs than narrow categorizers. 

In terms of hemispheric function, our data suggest that narrow categorizers make 
characteristic use of the left hemisphere. This fits with the notion that narrow categorizers are 
more analytic in information processing since, according to the double-dominance model, the left 
hemisphere is predominant for analytic processing. As for the broad categorizers, the data look 
promising for our hypothesis concerning their relative dependence on right hemisphere in 
processing, but at this stage we cannot be sure that this dependence exists. The trouble arises 
from the small number (five) of broad categorizers who could be classified as consistent movers. 
There could, of course, be some significance in this fact, and the related one that these subjects 
made significantly fewer LEMs than did narrow categorizers. Whether these phenomena have to 
do with cerebral organization or to more restricted issues such as their susceptibility (or lack of 
it) to this type of experimental paradigm will have to await further research. 

Analyses of question type failed to reveal any trend for this variable to produce systematic 
shifts. Narrow categorizers consistently depended on left hemisphere activity whatever the type 
of query. Broad categorizers, as a group, did not produce convincing evidence of right (or left) 
hemisphere dependence, but did tend, as individuals, to rely on one hemisphere in processing 
both verbal/numerical and spatial information (r = 0-641). These findings can be viewed as a 
special case of the perceptual orienting model proposed by Kinsbourne (1973). His view is that if 
one cerebral hemisphere is at a higher level of activation than the other, information processing 
will be faster in that side in general. Bruce & Kinsbourne (1974) demonstrated hemispheric 
enhancement by having subjects attempt to recognize complex visual patterns with or without a 
list of common words held concurrently in memory. From the results it seems that concurrent 
verbal activity produces better recognition of visual forms in the left hemisphere, as opposed to 
the usual finding of right hemisphere superiority. It appears that in our study individual subjects 
tended to rely on one hemisphere or the other (confirming Gur & Gur’s (1975) finding that this 
happens in face-to-face questioning), and that the choice of hemisphere is partially predictable 
from knowledge of a subject’s category width score - narrow categorizers are likely to use the 
left hemisphere. 

This study has added a new factor, cognitive style of the individual, to the list of determinants 
of hemispheric involvement in information processing. Apparently, the differential activation of 
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the two hemispheres can be predicted not only from task type (Gur et al. 1975; Kocel et al. 
1972; Weiten & Etaugh, 1974), task difficulty level (Patterson & Bradshaw, 1975), personality 


characteristics :Bakan & Shortland, 1969) or differential learning experiences (Bever & Chiarello 


> 


1974), but also “rom the cognitive style of the individual, measured in this case by Pettigrew’s 
C-W Scale. However, the links between cognitive style and hemispheric functioning must be 
seen as tenuous at this stage, precisely because of the lengthy list of predictors (as well as for 
any residual uncertainty about the significance of LEMs). Future research should be directed at 
ascertaining the degree to which these variables are independent determinants of hemispheric 


activation. 
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Residual effects in recall after a stimulus suffix 


David Salter and Jim Osler 


Two experiments investigated serial recall with eight-word lists in which the frequency rating of the terminal 
word was manipuleted. The effect on recall of two kinds of verbal stimulus suffix as well as a control noise 
suffix was also tested. Recall for the terminal items in the lists was analysed. Experiment I showed that 
interference with a verbal suffix, irrespective of the nature of the suffix, was at least 40 per cent compared 
with a noise suffix for homogeneous lists in which the terminal list-word was a member of the same standard 
recall set as the previous words in the list. This suffix interference was considerably reduced with 
heterogeneous lists in which a new terminal list-word was introduced. However, the reduction in interference 
was less when the verbal suffix itself was also a new word compared with a standard verbal suffix. 
Experiment II introduced an additional end-of-list cue before the suffix, and demonstrated that this cue did 
not affect the pattern of findings of the previous experiment. In the heterogeneous lists of both experiments, 
new words of high frequency were not differentiated in recall from new words of low frequency, but the 
latter condition was characterized by an interesting increase in recall error at earlier serial positions in the 
list. 


The observation that better recall over the last few serial positions occurs with auditorily 
presented lists compared with visually presented ones goes back at least to Washburn (1916), 
who remarked on the superiority of the auditory after-image to the visual one. This original 
observation has >een further defined by Crowder & Morton (1969) and Morton (1970 b; 1976), 
who proposed a storage mechanism, 'Precategorical Acoustic Storage (PAS), that is specific to 
speech. PAS registers the phonological features from a speech input and is sensitive to acoustic 
features such as the relative spatial location or the pitch of the stimulus but the storage 
mechanism is prelinguistic and the analysis of semantic variables does not lie within its domain. 
The spoken input remains available in PAS for some short period after presentation, irrespective 
of whether the material has been recognized and understood, or not. 

The advantage accruing from PAS is illustrated best with supra-span lists of items for 
immediate serial recall. In this context, the terminal item in auditorily presented lists shows 
markedly superior recall compared with the final item in visually pesented lists (it is of no 
consequence whether the experimenter or the subject voices the list for recall). 

However, the superior recall with acoustic items can be eliminated by the addition of a 
stimulus suffix, which is an irrelevant item at the end of the list that does not require a response 
(Crowder & Morton, 1969). It was on the basis of suffix effects that Crowder & Morton 
postulated the existence of PAS. A major experimental review of their findings can be found in 
Morton, Crowder & Prussin (1971). 

The massive Cecrement in recalling the last item in the list, which was caused by a verbal 
suffix, suggested initially that information about the final item was stored only at a relatively 
unprocessed level of coding, which was precategorical and acoustic. Yet other evidence (Salter, 
1975; Salter, Springer & Bolton, 1976) demonstrated that the final item can be recalled with high 
probability desp:te the presence of an additional verbal suffix. Two reasons are proposed why this 
effect obtained. The first is that more extensive coding of the final list item allowed recall to be 
based on attributes other than those related to precategorical coding. This explanation agrees 
with proposals made by Morton (1970 b) and Glanzer (1972) that serial recall is based on 
secondary memory as well as primary memory, and allows the PAS mechanism to remain extant 
(see Salter et al. 1976, fig. 4). The second reason to explain this effect was suggested by Routh 
(personal communication), who proposed that a categorically heterogeneous item at the terminal 
serial position signals the end-of-the-list so that the ensuing suffix can be effectively ignored. 


r 
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Experiment I 

The design of this experiment is relevant to both these alternatives. It compared the recall for a 
final item which was old, that is one of the recall set, with the recall for a final item which was 
new and which also varied in its relative frequency of occurrence in the language. 

Furthermore the experiment investigated the effects of manipulating the verbal suffix. There is 
good evidence (Morton et al. 1971) that verbal suffix interference operates primarily along 
physical dimensions such as location and pitch; the more closely the suffix is physically identified 
with the stimulus list, the more effectively it masks the final item. When Morton et al. (1971, 
Expts V, VI, XIV) tested cognitive dimensions of the suffix with factors such as semantic 
similarity, frequency and emotionality, no systematic effects were obtained which could be 
attributed to these ‘intrinsic’ properties of the suffix compared with its ‘extrinsic’ properties. Yet 
if the introduction of a new, final item causes that item to be more extensively coded, making 
interference at a precategorical level irrelevant to recall performance (Salter, 1975), we thought 
that introducing a new suffix might similarly increase its potential for recall interference. We 
therefore compared a condition in which the same verbal suffix tagged the lists with a condition 
in which a new suffix was used each time a list was presented. On the other hand, if a new 
terminal item functions as an end-of-list marker so that the suffix can be effectively ignored, then 
it will be irrelevant to recall whether the suffix is an old or a new word. 


Method 


Materials. Stimulus lists were eight words long. The words for the first seven serial positions were selected 
from a set of nine words which were monosyllables with four letters: book, face, half, line, name, plan, room, 
ship, wish. Each word occurred more than 200 times per million in the Thorndike-Lorge word counts. The 
first letter was a consonant which was different for each word in the set. With repeated presentation these 
words became old items to the subjects. 

The eighth and last word in the list was also a four-letter word with an initial consonant. It was drawn 
from one of three stimulus categories: either (a) one of the old words (OLD 8) listed above, or a new word (b) 
a high-frequency word (NEW 8HF) which occurred more than 200 times per million in the Thorndike—Lorge 
word counts, or (c) a low-frequency word (NEW 8LF), defined as a word occurring between once and ten 
times per million. New words were selected at random from lists of HF and LF words which satisfied the 
selection criteria. 

The suffix was also varied. It could be either a burst of white naise (Norse) 150 msec in duration, or the 
word done (SAME), or a very low-frequency word (NEW) which was a monosyllabic word that occurred less 
than once per million. 

Stimulus lists were prepared so that no word occurred more than once in any list. Lists were recorded 
onto magnetic tape in an English male voice at the rate of 2 items/sec. The word ‘Ready’ preceded each list 
by 3 sec and 15 sec elapsed between list presentations for the subjects’ recall responses. 


Design. Nine experimental conditions were established by combining the three categories of eighth list word 
with the three types of suffix. These were tested in five blocks of 20 stimulus lists. The first two trial lists in 
each block were not scored. Of the remaining 18 lists in each block, there were two lists for each of the nine 
conditions, which were presented in a random order to distribute condition—order effects with no condition 
immediately repeated. Five different block orders were presented to separate subject groups. One, practice 
block was given before the five experimental blocks. 


Procedure. Subjects were seated at individual desks. Extraneous noise was effectively eliminated by using 
headphones through which the stimulus lists were presented binaurally from a Revox tape-recorder. 

The set of nine words were listed in the instructions. Subjects were allowed to respond with only the first 
one or two letters for the first seven items, but were required to write down the eighth word in full. The 
instructions emphasized that the suffix word should not be written down, that list recall should be made in 
the order of presentation, and that items were to be recalled ın the correct serial position. Lists were written 
on prepared response sheets, and covered from view before the next list was recalled. The block of practice 
lists was given, and testing of the experimental lists proceeded with a short rest period between blocks. 
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Subjects. There were 20 subjects aged from 18 to 44 years: 8 males and 12 females, They were tested in 
group sessions lasting approximately an hour. 


Results 


The recall lists were scored by counting any item which was not entered into the correct serial 
position as an error. These error data were averaged across subjects for each serial position 
(SP). Serial position curves are displayed in Fig. 1 depending on whether the final list-word was 
a member of the recall set (OLD 8) or a new high-frequency word (NEW 8HF) or a new 
low-frequency word (NEW 8LF), and depending on whether the suffix was the SAME word or a 
NEW low-frequeacy word or a NOISE burst. Each data point is based on 200 observations. 
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Figure 1. Serial pos:tion curves for immediate recall of eight-word lists in Expt. I for combinations 
of frequency of the final word in the recall list (OLD 8, NEW 8HF and NEW 8LF) with suffix types: 
NOISE, A--—--~, ; the SAME suffix word, 0—3; and a NEW suffix word, O------- -O. 


Table 1. Mean proportion of recall errors for serial position eight (SP8) in Expt. I 





Final Stumulus suffix 

recall —_-- 
item NOISE SAME NEW 
OLD 8 0-07 0-56 ` 0-52 

NEW 8HF 0-10 0-26 0-41 


NEW 8LF 0-10 0-24 0-37 
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To test the experimental manipulations of the eighth item and of the suffix, a three-factor 
analysis of variance was conducted with one variable between subjects and two variables within 
subjects. This analysis used the number of errors made by each subject on the final serial 
position (SP8). Fixed effects were assumed. The presentation order of trial blocks did not reveal 
significant variation between groups (F= 0-23, d.f. 4, 15). The two within-subject factors 
produced significant variation for the terminal list-item conditions (F= 33-7, d.f. 2, 30, P< 0-001) 
and for conditions changing the suffix (F= 51-5, d.f. 2, 30, P< 0-001). Their interaction was also 
significant (F= 10-7, d.f. 4, 60, P< 0-001). No other interactions were significant. 

Analysis of the interaction between the experimental treatments was then made using the 
Newman-Keuls test. For convenience the relevant data are summarized in Table 1. 

The noise suffix had almost no effect as a function of the type of final recall item. Although 
this could be due to a floor effect, we anticipate the results of a second experiment to report that 
similar results were there obtained for conditions in which no floor effect occurred. All the noise 
suffix conditions were characterized by better recall over the last few serial positions and for the 
final serial position particularly. In this respect the data were in agreement with previous 
findings. With homogeneous lists in which the terminal item was one of the recall set (OLD 8), a 
verbal suffix produced a significant decrement in recall (P< 0-01) compared with a noise suffix; 
the interference was about 45 per cent irrespective of the nature of the verbal suffix. This recall 
decrement was reduced considerably with heterogeneous lists in which a new item was 
introduced (NEW 8HF and NEW 8LF). With the heterogeneous lists, the new, final item was 
recalled better with a NOISE suffix than with a verbal suffix that remained constant (NEW 8HF 
P<0-01; NEW 8LF P< 0-05), and recall for the final item was worse when it was followed by a 
NEW word suffix than by the SAME word suffix (NEW 8HF P< 0-01; NEW 8LF P< 0-05). The 
effects remained constant irrespective of the frequency level of the new eighth item in the recall 
list. 


Discussion 


Recall over the last few items in the homogeneous lists, OLD, 8, provided a data display (see Fig. 
1) which is now familiar in suffix work: namely that a verbal suffix grossly interfered with final 
item recall compared with a noise suffix and that recall at SP8 was not differentiated by the 
‘intrinsic’ properties of the suffix. The same word suffix or a new one were equally effective in 
blocking final item recall. There was some indication that a new suffix may have interfered with 
recall at the early serial positions (see OLD 8 and NEW 8LF in Fig. 1) but the effect was not 
consistent, and was therefore noted only. 

In the two sets of heterogeneous lists, the new final item was much better remembered than 
the old final item in the homogeneous lists yet its frequency level did not affect recall. This 
result accommodated a hypothesis that the heterogeneous item cued the end of the recall list so 
that attention was directed away from the suffix and its interference was thereby attenuated. A 
similar effect was reported by Morton (1970 a) who therefore concluded (Morton et al. 1971, 
Expts. VI-X) that PAS was functionally located beyond an attention filter mechanism 
(Treisman, 1960; Broadbent, 1971). If the heterogeneous item allowed attention selectively to 
bypass the suffix in some way so that it was not processed, then the frequency of the final item 
would then become a relatively less important factor in recall. The serial curves for NEW 8HF and 
NEW BLF in Fig. 1 provided partial support for this interpretation. (The fact that subjects were 
required to write out the final recall word in full may have also contributed to its functioning as 
a marker in this way.) Yet, a signalling hypothesis did not account for the differentiation between 
the two kinds of verbal suffix in the heterogeneous lists. This difference could be interpreted 
plausibly enough by a post hoc explanation of confusion between a new suffix combined with a 
new final item, which led to the increase in error observed, whereas a new terminal item with 
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the same suffix repeated would result in a reduction of error. However, in both instances this 
required the suffix to be processed, which a signalling hypothesis denied. 


Experiment II 

The purpose of this experiment was therefore twofold. First, it was to see whether recall with 
the heterogeneous item remained better when the written response to it was similar to any other 
item in the recall procedure. Second, we wished to see whether the introduction of an 
unequivocal signal, which cued the end of the recall list and acted as a device to bypass any 
processing of thz suffix, would thereby bring about an improvement in recalling the final item 
irrespective of whether it was an old or new word. 


Method 


Materials. The stimulus lists were identical to those used in the previous experiment, but the method of 
presentation introcuced new features. Lists were recorded onto magnetic tape in a female voice with 
presentation at a rate of one item per 0-6 sec (the verbal suffix, or control noise-burst, was recorded at the 
same ISI as the pr2vious list items). A more important innovation was the introduction of a brief tone cue, 
generated from a Birkbeck timer, which sounded two seconds before list onset as a preparatory warning 
signal and which sounded once again between the eighth word and the suffix to mark the end of the recall 
list. The tone cue was located between the list and suffix so that it did not interfere with identification. The 
pair of tones were inserted to act like auditory parentheses about the recall list itself. 


Design. The nine experimental conditions were presented in six blocks of 20 lists. The first block was 
included for practice and was not scored. The order of presenting the remaining five blocks was varied 
across five separate groups of subjects. Within each experimental block, the first two lists were discarded; 
the remaining 18 lists were distributed equally among the experimental conditions. 


Procedure. Subjects were tested in groups, seated at separate desks. The taped lists were relayed binaurally 
via headphones. A new list was presented when subjects had finished writing out their responses to the 
previous list. The presentation sequence was controlled by the experimenter who monitored the experiment 
throughout and ensured that the instructions were followed. 

In this experimert the instructions did not list the ‘familiar’ word set from which the first seven serial 
positions of the stimulus lists at least were generated. Subjects were informed only that all list words were 
monosyllables of which most were familiar words but that some unfamiliar words would occur as well. No 
distinction was mace in the method of responding between the first seven list items and the final list item 
(see Expt. I); all words were written out in full. These revisions were introduced because we did not wish to 
draw attention unnecessarily to the final list word. The instructions were deliberately neutral about the suffix 
item, and specified that the list of eight words would be followed by an extra item, the suffix, which was a 
signal to begin reca.l. It was emphasized that subjects should recall the list items only, and that they were 
not required to respond to the item which occurred after the’ second tone cue. The usual requirements for a 
suffix experiment were included: to remember the first few items correctly and then any others that could be 
recalled, to set down the words in the order they were presented without backtracking, and to enter each 
word in its correct serial position. Subjects were encouraged to guess if they were not sure. The complete 
session lasted for just over the hour, beginning with the practice block and then the five experimental blocks. 
A short rest period was given between blocks. 


Subjects. The 15 subjects were pupils from a local school in Newcastle: six boys and nine girls. Their ages 
ranged from 16 years 8 months to 18 years. 


Results 


In scoring the recall lists, any item which was not entered into the correct serial position was 
scored as a whole error except where the item was entered into the serial position immediately 
adjacent to the correct serial position when it was scored as a half-error. This marking system 
extracted the maximum error information from the data, which was characterized by many 
errors than the previous experiment (see Subjects). However, the data from both marking 
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systems, whole and whole-plus-half errors yielded the same pattern of results from the data 
analysis. The reliability of these effects was assessed by subjecting data from both scoring 
methods to an analysis of variance. This tested across groups between presentation orders of 
stimulus blocks, and within subjects between the two treatment levels: frequency classes at SP8, 
and suffix type. Fixed effects were assumed for these factors. Both sets of error data yielded 
similar analysis for all the results mentioned; therefore only the data which included adjacent 
errors are reported. Serial position curves are displayed in Fig. 2 for treatments at SP8 and for 
the suffix with these data averaged over subjects. Each data point represents the average of 150 
observations. $ 
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Figure. 2. Serial position curves for immediate recall of eight-word lists in Expt. II for combinations of 
frequency of the final word in the recall list (OLD 8, NEW 8HF and NEW 8LF) with suffix type: 
NOISE, A---—- A; the same suffix word, 0—0; and a NEW suffix word, O- ------- O. 


Table 2. Mean proportion of recall errors for serial position eight (SP8) in Expt. II 


Final Stimulus suffix 

recall Pelee eR Ca ccc SY eC 

item NOISE SAME NEW 
OLD 8 0-30 0-79 0-82 
NEW 8HF 0-31 0-54 0-58 


NEW 8LF 0-33 0-50 0-61 
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Changing the presentation order between groups did not produce reliable variation but both 
the within-subject factors did. The effect of changing the item at SP8 in the recall list resulted in 
an F equal to 15-2 (d.f. 2, 20, P< 0-001), and changing the type of suffix affected recall of the 
last item (F= 73-5, d.f. 2, 20, P< 0-001); the interaction of these two factors was also significant 
(F=8-8, d.f. 4, 40, P< 0-001). More precise analysis of this interaction was carried out 
comparing individual treatments with the Newman-Keuls test. Table 2 presents a summary of 
SP8 error probebilities for recall in the nine conditions. 

In all three sets of data in Fig. 2, recall at SP8 was adversely affected by a verbal suffix 
compared with a noise suffix (homogeneous and heterogeneous lists, P< 0-01). However, recall 
suffered less interference from the verbal suffix with a heterogeneous item than with a 
homogeneous one (P< 0-01). The two word suffix treatments (SAME vs. NEW) were individually 
compared in the three sets of data, and were found to be not differentiated from each other 
statistically. Ne ertheless, the pattern of recall for the verbal suffix conditions remained 
consistent with zhat found in the first experiment, and a prediction based on the former results 
that recall would be better with the saME word suffix, irrespective of list type, was confirmed 
(Wilcoxon, P< 0-025, one-tailed). 


Discussion 


The treatment manipulation at SP8 produced a remarkably similar pattern of results in both 
experiments despite the introduction of an interpolated tone cue to signal the end-of-list in the 
second experiment. As an end-of-list signal, the tone cue was unequivocally distinct acoustically 
from the adjacent words. Yet, the results in the latter experiment, particularly those for 
homogeneous lists, were clear cut in showing that a tone cue was not utilized to switch attention 
away from suffix interference. It has been suggested that the processing of such a cue is carried 
out in a functiorally separate part of the nervous system (Morton, 1968, 1970; Rowe & Rowe, 
1976), although this view has been queried by Salter et al. (1976, first proposal, p. 348). If it 

is correct that tke processing of non-verbal information is functionally isolated, then the 
separation possibly detracted from the efficiency of the cue to act as a temporal marker between 
list and sutfix. (For related experimental work, see Ladefoged & Broadbent, 1960; Bertelson & 
Tisseyre, 1970.) However, in our procedure the tone cue was located at the same auditory 
channel, and the list words were not connected grammatically so that any temporal displacement 
of the cue was minimized. We therefore conclude that any switching of attention which was 
possible with a tone cue proved insufficient to prevent interference from a verbal suffix. 

We now turn to the question whether the heterogeneous item itself was sufficient to evade 
suffix interference. Presentation of the new final list-item might allow attention to be switched 
almost immediately at a precategorical level because the heterogeneous item was also 
differentiated acoustically from the more familiar members of the recall set. This hypothesis 
encompasses the findings of Salter (1975) in which heterogeneity was based ostensibly on an 
alphanumeric distinction, and Salter et al. (1976) in which the odd item was either a nonsense 
word or a meaningful word. The evidence reported in this paper ran counter to this explanation 
because a distinction between a new and an old suffix affected recall. Furthermore, other 
attempts at training subjects to use a switching strategy to evade auditory suffix interference 
have not succeeded (Hitch, 1975). A consensus of the evidence therefore seems to favour a 
coding hypothesis rather than a signalling hypothesis although the latter cannot be rejected 
outright. The final item thus appears to be registered automatically at a post-categorical level 
with sufficient information for reasonably good recall although its frequency in the language did 
not affect its own recall level. 

Yet having concluded that frequency was not operative, we retract it immediately. Although 
the low-frequency words themselves were not worse recalled, an interesting related effect was 
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observed: in condition NEW 8LF the error probability increased at SP2 and SP3, and this effect 
was replicated in both experiments as the composite serial curves in Fig. 3 show. 
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Figure 3. Serial position curves, collapsed across suffix type, as a function of the final word in the recall list: 
a word from recall set (OLD 8), A---+--+-+- A; a new, high-frequency word (NEW 8HF), 0—0; anda 
new, low-frequency word (NEW 8LF), @-------- -@. Data for Expts I and JI. 


In Expt. I the two frequency treatments diverged at SP3 (Wilcoxon, P< 0-01, two-tailed), and 
in Expt. IT recall in the low-frequency condition was worse at SP2 (Wilcoxon, P< 0-02, 
two-tailed) and SP3 (Wilcoxon, P< 0-01, two-tailed). The effect was thus replicated and 
extended. 

This effect can be explained by attention being transferred from the first items on the list to 
the low-frequency word at SP8. We assume that subjects followed the instruction to concentrate 
on remembering the first few items for serial recall. We assume too that these early items in all 
lists were processed at presentation in the same manner because there was no way of the subject 
knowing at this point in time whether the terminal item was to be a homogeneous word, or a 
heterogeneous word of high or low frequency. If we further assume that low-frequency words 4 
made some extra demand on processing capacity at presentation and this required attention (see ` 
Watkins, 1974; Routh, 1976 and Salter et al. 1976 for further discussion), then the consequent 
division of limited capacity explains the information loss from the items being maintained at 
earlier serial positions. Anderson & Craik (1974) postulated that items to which the subject pays 
attention are held in primary memory and that the amount of forgetting from primary memory is 
related to the demands of a subsidiary task on available processing capacity. It is preferable, 
perhaps, to think of the low-frequency word as diverting attention away from maintenance and 
rehearsal operations, irrespective of where the information is stored (Brodie, 1975). Comparable 
recall performance at SP8 across the two frequency levels was thus achieved at the cost of 
earlier items which were most at risk in the low-frequency lists. If this effect is sustained by 
further evidence, the proposal of automatic registration (Salter, 1973; Salter et al. 1976, p. 348) 
stated earlier would require suitable qualification. 
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Manipulating the word suffix itself also led to some differences in final item recall. These data 
were sufficiently intriguing to warrant comment, and possibly further investigation, because the 
more cognitive characteristics of the suffix word have been deemed irrelevant in determining the 
interference effect. In Expt. I, the difference between the two verbal suffix treatments was 
statistically reliable for individual comparisons in the heterogeneous lists, and a similar difference 
was also found when the lists were combined in Expt. II. This consistent variation across both 
experiments was noteworthy for the implication it contains: that the suffix word underwent more 
than cursory processing and could interfere at coding levels other than PAS coding. The 
experimental review by Morton et al. (1971, p. 185) reported that the cognitive attributes of the 
suffix word were irrelevant to its effect, and the present findings for condition oLD 8 supports 
that conclusion; the effects linked to the category of verbal suffix in these data speak only to the 
heterogeneous lists. 

In conclusion, we wish to isolate three main factors of recall, which seemed to be operating in 
this study: 

First, information about the acoustic parameters of the stimulus remained available for a 
limited time span after presentation so that processing with attention could be delayed or even 
repeated. Presentation of another item or a stimulus suffix with similar acoustic qualities 
effectively displaced the previous acoustic information. 

Second, a new -tem that was presented at the terminal serial position effectively reduced the 
recall decrement usually observed for this item with a verbal suffix. 

Third, the encoding of a rare word at the terminal recall position seemed to require that 
attention was transferred from items already being maintained in memory. The removal of 
attention from these stored items jeopardized the recall of the weakest items in a serial response 
procedure. 

The first factor describes the effect of a verbal suffix compared with a noise suffix. The second 
was demonstrated by the fact that differences in recall occurred even though a verbal suffix was 
supposed to displace the acoustic information about the final list-items. The third factor explains 
the curious loss of items at earlier serial positions when the final list-item was a relatively 
unknown word. 
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The tachistescopic recognition of letters under whole and partial report 
procedures as related to intelligence 


James L. Mosley 





Investigations of the short-term memory task performance of retarded individuals have indicated that these 
individuals demonstrate a deficit in the mechanisms necessary for the acquisition, storage and/or retrieval of 
information. The present study examined the tachistoscopic letter recognition task performance of retarded 
and non-retarded individuals under a partial report and a whole report procedure. The results revealed that 
the retarded subjezts did significantly more poorly relative to the non-retarded subjects under both 
procedures, The data were interpreted as indicating that the retarded subjects were inefficient in their 
strategy to make tre simultaneous input task manageable. Further, the data provided no support for the 
suggestion that a visual-to-auditory encoding process exists between iconic and short-term memory. 





Recent conceptualizations of information processing view the human organism as a processing 
system in which sensory information undergoes ‘several transformations or operations’ before it 
is permanently s-ored in memory. It has been demonstrated that adults are capable of taking in 
large amounts of visual information but that this information is rapidly lost during the first 250 
msec after stimulus offset (Sperling, 1960; Averbach & Coriell, 1961; Keele & Chase, 1967; 
Sheingold, 1973) Both the rapid decay of visual information and the interest in the course of 
information loss over time have led to the development of the partial report technique where 
subjects are cued at variable intervals following stimulus offset to report only a portion of the 
visual stimulus (Sperling, 1960; Averbach & Coriell, 1961). 

Research indicating consistent performance superiority employing the partial report procedure 
relative to the wkole report procedure (Sperling, 1960; Turvey & Kravetz, 1970; Doost & 
Turvey, 1971; Eriksen & Colegate, 1971); research investigating the effect of backward visual 
masking on partial report performance (Turvey, 1973); and the demonstration that the size of the 
partial report superiority declines to zero as the cue delay increases from zero to approximately 
250 msec (Averbach & Sperling, 1961; Keele & Chase, 1967; Coltheart, 1975) provide evidence 
for the existence of an iconic memory where the visual information is held briefly in an 
unprocessed state which is both comprehensive and amenable to subsequent access. As such the 
iconic store is seen as the first step in a sequence of operations related to permanent memory 
storage. 

It has been suggested further (Sperling, 1963; Keele, 1973) that a visual-to-auditory encoding 
may occur between iconic memory and short-term memory. Evidence for this suggestion was 
generated by the work of Conrad (1964), who noted that the form of errors in recall is one way 
of assessing the nature of representation in short-term memory. Conrad (1964), employing two 
sets of letters tha: were similar within a set but not between sets, demonstrated that items 
recalled incorrect.y were acoustically related to the original items even though the original items 
were presented visually. The implication was that the verbal material in iconic storage is 
transformed from a visual to an auditory code. 

Many investigators studying the nature of short-term memory in retarded individuals have 
suggested that these individuals demonstrate a deficit in the mechanisms necessary for the 
acquisition and storage of information (Ellis, 1970; Olson, 1971; Kellas, Ashcroft & Johnson, 
1973) and/or with retrieving information from short-term memory (Butterfield, Wambold & 
Belmont, 1973; Dugas & Kellas, 1974). However, the majority of the short-term memory 
investigations have neither employed tasks sensitive enough to allow for the assessment of 
iconic storage nor employed strategies designed to determine the nature of the iconic storage - 
short-term memory encoding process. The few studies which have attempted to assess iconic 


102 James L. Mosley 


memory in retarded individuals (Libkuman & Friedrich, 1972; Pennington & Luszcz, 1975) have 
indicated that retarded individuals demonstrate a quantitative as opposed to a qualitative 
difference in iconic memory relative to non-retarded individuals. 

Research in the area of iconic memory typically controls visual input by presenting stimuli 
tachistoscopically for durations too brief to permit eye movement, since such movement plays 
an important role in the perception of visual stimuli (Averbach & Coriell, 1961). In addition, 
there are data which suggest that scanning strategies change with chronological age 
(Zaporozhets, 1965; Vurpillot, 1968) and that these strategies facilitate improved visual 
discrimination, therefore the tachistoscopic presentation of the visual stimuli at exposures of 
approximately 100 msec has been suggested (Averbach’& Coriell, 1961). 

In an effort to delineate the differences that underlie the reported poor short-term memory 
capabilities of retarded individuals relative to non-retarded individuals the tachistoscopic 
presentation of verbal stimuli will be combined with the partial report and whole report 
procedures. A comparison of performance under each of the partial report and whole report 
procedures will yield an index of iconic memory and short-term memory respectively. By 
examining the pattern of errors of commission under the whole report procedure a basis for 
assessing the encoding process between iconic and short-term memory can be established. 


Method 
Subjects 


Six male and four female trainees (mean CA = 23-26 years; s.D. = 3-51 years) from the Vocational and 
Rehabilitation Research Institute, Calgary, Alberta, served as subjects. These individuals were selected 
because they demonstrated no obvious sensory deficits; they were not taking medication on a continuing 
basis nor at the time of the study; their clinical records were not such as to suggest the presence of organic 
etiological factors and their intelligence quotients were above 50 IQ points on one of three individually 
administered standardized intelligence tests (the Wechsler Adult Intelligence Scale; the Stanford-Binet; the 
Ravens Progressive Matrices) as determined from their clinical records. A current estimate of mental age 
(MA) and IQ was obtained by administering Form A of the Peabody Picture Vocabulary Test (PPVT). The 
mean PPVT MA was 11-08 years (s.D. = 1-85 years) and the mean PPVT IQ was 71:2 (s.D. = 8-99), 

The letter recognition ability of the retarded subjects was assessed by presenting each of the 26 letters 
singly on cards in a random sequence to each subject prior to the initial experimental session. The letter 
script, size and viewing distance was such as to approximate the experimental stimulus conditions. The 
exposure time for each letter was less than 4 sec. All subjects were able to identify each of the letters. 

Each of the retarded subjects was also tested for near binocular acuity employing a Bausch & Lomb 
Master Orth-Rater (No. 71-21-40-65). The median acuity rating (Snellen Notation) for the retarded sample 
was 20/29 with a range of from 20/33 to 20/20. The visual angle equivalents for this Snellen Notation range 
are 1-67 min to 1-0 min respectively Since each letter within the stimulus arrays subtended visual angles of 
12-8 min both horizontally and vertically, the near binocular acuity of the retarded subjects was sufficient to 
allow for adequate task performance. 

Ten non-retarded subjects (six males and four females) were selected from the student population at the 
University of Calgary. These subjects were matched to the retarded individuals according to eironcloncs) 
age (mean CA = 23-08 years; s.D. = 3-63 years). 


Apparatus 


A Scientific Prototype Three-Channel Tachistoscope (Model GB) was employed. Field exposure durations 
were controlled through the three solid state tachistoscopic timers such that the stimulus array field and the 
marker field were illuminated for 100 msec and 300 msec respectively. A blank field of 50 msec duration was 
interposed between the stimulus array and the marker fields. The illumination for all three fields was held 
constant at 20 ft L. A dim field with a fixation cross was employed to allow each subject to focus prior to 
the stimulus array presentation. The stimulus presentation sequence was activated when the subject pressed 
a button on a hand-held switch when he was ready for each trial. To prevent the accidental triggering of the 
stimulus array—blank field~marker field sequence by the subject a circuit switch was employed by the 
experimenter which had to be activated before the subject switch could trigger the sequence. 

The stimulus arrays consisted of two sets of 50 lettered 5x7 white cards. The lettering was Helvetica 24 
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CLN (medium size). For the whole report set the entire 26 letters of the alphabet were employed. For the 
partial report set 25 letters were employed excluding the letter I. Each stimulus array card contained two 
rows of five randomly selected equally spaced letters with the entire array measuring 6-35 cm horizontally 
and 2-22 cm verticaly. As viewed at 127 cm the subtended visual angles were approximately 2° 50’ and 1° 1' 
respectively with ezch letter subtending a visual angle of approximately 12-8’ both horizontally and 
vertically. No letter appeared twice in any one stimulus array. 

In the partial report condition a visual indicator consisting of a black vertical bar which subtended a visual 
angle of approxima-ely 4' honzontally and 12-8’ vertically was employed. The indicator was positioned such 
that it was placed above one of the letter positions in the upper row or below one of the letter positions in 
the lower row for each trial. The position of the marker on each trial was random with the restriction that 
the marker indicated each of the ten positions four times within a session consisting of 40 trials. 


Procedure 


Subjects were tested individually in one session per day on four consecutive days with each session 
consisting of 40 experimental trials and lasting an average of 55 min. For each of the partial report and 
whole report procedures the initial session included ten training trials prior to the experimental trials. All 
subjects performed under both the partial report and the whole report procedures. One-half of the subjects in 
each of the retarded and non-retarded groups received the partial report-whole report sequence with the 
remaining subjects receiving the reverse sequence 


Partial report procedure. For the partial report training trials the subject was asked to look into the 
tachistoscope and focus on the cross. When the cross was clearly in focus the subject was told that he 
would see two rows of five letters for a very very short time after pressing the hand-held button. The subject 
was informed that, immediately upon the disappearance of the letters, a marker would appear for a very 
short time. The macker would be above a letter position in the upper row or below a letter position in the 
lower row, and the subject was to name the letter. The subject was told that the letters would appear only 
once so that it was necessary to watch the screen carefully when the button was pressed. The subject was 
also told that the letter ‘I’ would not appear. The subject was encouraged to do well and told that the task 
would become easier with practice. Following these instructions, questions were answered and the training 
trials were begun. The experimental trials were not started until the experimenter was satisfied that the 
subject was comfortable with the apparatus and understood the task required of him/her. In all cases the 
planned ten training trials were sufficient for this purpose. 


Whole report procedure. For the whole report training trials the subject was asked to look into the 
tachistoscope and focus on the cross. When the cross was clearly in focus the subject was told that he 
would see two rows of five letters for a very very short time after pressing the hand-held button. The subject 
was informed that, immediately upon the disappearance of the letters, he/she was to write the letters on a 
score sheet in their correct positions. The subject was also told that the letters would appear only once so 
that it was necessazy to watch the screen carefully. The subject was encouraged to do well and was told that 
the task would become easier with practice. Following these instructions, questions were answered and the 
training trials were begun. The 40 experimental trials were started when the experimenter was satisfied that 
the subject was comfortable with the apparatus and understood the task requirements. Ten training trials 
were sufficient for -his purpose. 

For each of the first, second, third and fourth experimental sessions the sequence of the stimulus array 
cards was randomly changed by thoroughly shuffling the cards. 


Results 
Partial report performance 


Performance uncer the partial report procedure was assessed by examining the percentage of 
correct responses across the three partial report experimental sessions (120 trials). A group 
(retarded; non-retarded) by sequence (partial report-whole report; whole report—partial report) 
by session (one; two; three) mixed analysis of variance revealed a significant group main effect 
(F= 32-66, d.f.=1, 16, P< 0-001) with the non-retarded subjects demonstrating superior overall 
performance. The remainder of the main and interaction effects failed to reach statistical 
significance (P< 0-05). Due to heterogeneity of variance, as indicated by a significant Fmax test 
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(P< 0-01), a square root transformation was applied to the percentage of correct response data 
for this analysis. 

Under the partial report procedure the subject must connect the bar marker location to a 
particular letter location thereby requiring the subject to retain the spatial location of the 
elements within the stimulus display. An examination of the spatial cue effects was carried out 
by plotting responses in terms of their absolute positions removed from the cued letter in each 
of the upper and lower lines of the stimulus display. Those reported letters not in the display 
were recorded as intrusions. The combined data for the three partial report experimental 
sessions (120 trials) are presented in Fig. 1. Those reported letters that occupied the 
inappropriate row when plotted in terms of their absolute positions from the cued letter 
produced a response pattern similar to that found in Fig. 1 but of considerably lesser magnitude 
and comparable for both groups. 


100 
80 
®———® Non-retarded upper line 
O——~() Non-retarded lower line 
@-—-@ Retarded upper line 
60 O---O Retarded lower line 


Responses (%) 





Absolute positions removed from cue 


Figure 1. The percentage of responses for the non-retarded and retarded groups as a function of line in the 
stimulus display and absolute position removed from the cued stimulus (zero positions removed indicates a 
correct response; INT equals per cent intrusions for both upper and lower lines). 


The response patterns presented in Fig. 1 suggest that both groups were not responding 
randomly to the stimulus displays. The patterns themselves reveal that the most likely errors 
were those associated with the letter positions immediately adjacent to the cued letter position 
with fewer errors occurring as the distance from the cued letter position increased. 

The number of intrusions support the suggestion that the responses were not random. If the 
subjects were guessing the proportion of correct responses would be 1/25 or 4 per cent and the 
number of intrusions would approximate 60 per cent, since 10/25 letters were randomly 
presented in each stimulus array (the letter I was excluded). The intrusions for the non-retarded 
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group numberec 123/1200 or 10-25 per cent with the retarded group yielding 318/1200 or 26-50 
per cent. Both percentages were well below the 60 per cent value. The intrusion data suggest 
that although the retarded group (X = 31-8, s.p. = 12-23) responded with significantly more 
intrusions (t= 4 57, d.f. = 18, P< 0-001) relative to the non-retarded group (X= 12:3, s.D. = 5-69) 
they were not responding randomly. The number of intrusions per line did not differ significantly 
within either of the nonretarded or retarded groups. The number of correct responses, on the 
other hand, equalled 647/1200 or 53-91 per cent for the non-retarded group and 171/1200 or 14-25 
per cent for the retarded group. Again, both percentages were above the 4 per cent value. 
Investigations of partial report performance as related to the serial position of the cued letter 
in the stimulus array typically yield a ‘w’-shaped function (Averbach & Coriell, 1961; Haber & 
Standing, 1969; Merikle, Lowe & Coltheart, 1971; Townsend, 1973; Pennington & Luszcz, 1975). 
The ‘w’-shaped function is believed to be due to the relatively better acuity for items located 
nearest to the fixation point combined with less metacontrast interference for the end items 
(Haber & Standing, 1969). An examination of the pattern of responding was undertaken by 
plotting the mean number of correct letters for each group as a function of the cued letter’s 
serial position in each of the upper and lower lines of the visual display (Fig. 2). As indicated in 
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Figure 2. The mean number of letters correctly identified by the non-retarded and retarded groups as a 
function of line in tte stimulus display and serial position of the cued letter. 
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Fig. 2, the ‘w’-shaped function was not demonstrated by the subjects in the present study. 
There did appear to be better acuity for items nearest the fixation point for both groups but the 
better performance for the end items was not demonstrated by either group. 


Partial report-whole report comparison 

The percentage of correct responses for the retarded and non-retarded groups under the whole 
report and the partial report procedures were examined employing a group (retarded; non- 
retarded) by sequence (whole report-partial report; partial report—-whole report) by procedure 
(partial report; whole report) mixed analysis of variance. As in previous research, the scoring for 
the whole report procedure data was in terms of the correct letter in the correct position for the 
40 experimental trials of the present study. The group (2) by sequence (2) by procedure (2) 
mixed analysis of variance carried out on the square root transformed percentage of correct 
response data (significant Fmax test; P< 0-01) revealed a significant group main effect (F = 38-88, 
d.f.=1, 16, P< 0-001) with the non-retarded group demonstrating superior overall performance. 
A significant procedure main effect (F= 13-37, d.f. = 1. 16, P< 0-01) was also obtained indicating 
that performance under the partial report procedure was superior to that under the whole report 
procedure. The groupX procedure interaction effect (Fig. 3) also reached significance (F= 14-18, 
d.f.=1, 16, P<0-01). As seen in Fig. 3, the significant interaction effect was accounted for by 
the relative performance superiority of the non-retarded group under the partial report 
procedure. The remaining main and interaction effects failed to reach statistical significance. 
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Figure 3. The percentage of correct responses (square root transformation) for the non-retarded and 
retarded groups under the whole and partial report procedures. 
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Encoding 


In order to examne the nature of the encoding process hypothesized to exist between iconic 
memory and short-term memory an analysis of the errors of commission under the whole report 
procedure for each of the groups was undertaken. An analysis of the total number of errors of 
commission revealed that the retarded group (X = 98-8, s.D. = 43-66) made a significantly greater 
number of errors (t = 3-96, d.f. = 18, P< 0-01) relative to the nonretarded group (X = 38-1, 

$.D. = 20-96). 

An examination of the type of errors of commission employed the following error categories: 

Acoustic errors. Acoustic errors were those where the substituted letter was one which 
sounded like the correct letter. The criteria for auditory confusability were those derived by 
Conrad (1964) (e g. C-B; T-P; F-S; M-N). i 

Visual errors. Visual errors were those where the substituted letter was one which looked like 
the correct letter. The method employed for establishing visual confusability was that suggested 
by Keele & Chase (1967) employing the percentage overlap in letter contour (e.g. O-Q; F-E; 
B-R; C-G). 

Position errors. Position errors consisted of errors where the letter or letters involved were 
correct, however, their position was inaccurate. There were three types of position errors: (a) 
reversal: where the positions of two adjacent letters were exchanged; (b) horizontal shift: where 
a letter or a pair of adjacent letters were positioned one space to the right or to the left of the 
presented positicn; and (c) vertical shift: where the position of a letter was shifted directly up or 
shifted directly Cown from its presented position. 

Non-specific e-rors. Those errors of commission that did not conform to any of the acoustic, 
visual or position criteria. From inspection, it appeared that there was no pattern or consistency 
to these errors. 


Table 1. The errors of commission for the non-retarded and retarded groups across each error 
category under the whole report procedure 


Position 
Error Horizontal Vertical 
type Acoustic Visual Reversal shift shift Non-specific 
Group 
Non-retarded 32 15 20 64 25 225 
Retarded 6l 47 9 14 44 753 


The errors of commission were categorized according to the above criteria (Table 1) and 
subjected to a cài square analysis which revealed a differential response pattern for the retarded 
and non-retarded groups (x° = 64-26; d.f. =5, P< 0-001). 

As noted in Table 1, the retarded subjects made a greater percentage of non-specific errors 
(76-21 per cent) relative to the non-retarded subjects (59-05 per cent), The non-retarded subjects, 
however, made a greater percentage of position errors (28-59 per cent) relative to the retarded 
subjects (12-84 per cent) with the horizontal shift position errors accounting for the majority of 
the position errors (16-79 per cent). Both the retarded and non-retarded subjects were 
comparable for the percentage of errors under the acoustic and visual error categories. 
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Discussion 
Iconic memory 


The performance of the non-retarded subjects under the partial report procedure in the present 
study (5-4 letters) was consistent with the performance (5-2 letters) reported by Keele & Chase 
(1967) employing a 50 msec cue delay but inconsistent with that (3-58 letters) reported by 
Pennington & Luszcz (1975). These authors concluded that their partial report procedure did not 
indicate that more letters were available to the subject relative to the number that could be 
reported under the whole procedure, a conclusion that is inconsistent with a large body of 
literature. 

An examination of the partial report performance of the retarded subjects revealed that, 
although their correct recall (1-43 letters) was significantly poorer than that of the non-retarded 
subjects, their performance was above the chance level. This raises three possibilities with 
respect to the retarded subjects’ performance. 

First, iconic memory may be of shorter duration for retarded individuals relative to 
non-retarded individuals. Although the present data do not directly assess the duration of iconic 
store in retarded individuals it is most likely the case that iconic store is of sufficient duration to 
allow for the better-than-chance performance by these subjects on the partial report task. From 
a developmental perspective it has been demonstrated that children are comparable to adults on 
visually cued partial report tasks employing a 50 msec cue delay (Haith, 1971; Sheingold, 1973) 
suggesting that age differences in the duration of iconic store are probably not significant beyond 
five years. 

Second, the physical feature analysis of the items in the iconic store may be less efficient in 
retarded relative to non-retarded individuals. As Pennington & Luszcz (1975) suggest, the 
stimulus display must be feature analysed before items can be read out into short-term memory. 
The problem here would seem to be that the retarded subject is less efficient in utilizing the 
physical features of the stimulus display. The pattern of responding demonstrated in Fig. 1 
suggests that the retarded subjects are able to employ the spatial information available in the 
display in thet their error pattern is similar to that of the non-retarded group. The highest 
percentage of errors was found in letter positions immediately adjacent to the cued letter. The 
serial position response curves (Fig. 2) also suggest that the greatest response accuracy was 
found for those positions occupying the middle of the stimulus array, i.e. those closest to the 
fixation point. 

Third, the scanning of the iconic store may be less efficient in retarded relative to non-retarded 
individuals. The retarded subjects may not have access to perceptual encoding strategies that 
make the readout of a simultaneous input task manageable. The findings of Libkuman & 
Friedrich (1972) tend to support this possibility. These authors demonstrated that their retarded 
subjects required an exposure interval three times greater than that required by their 
non-retarded subjects. 

The retarded subjects in the present study apparently were not able to handle the stimulus 
input efficiently and their responding, both in terms of their overall percent correct recall under 
the partial report procedure and the relatively high percentage of non-specific errors of 
commission under the whole report procedure, was consistent with situations in which these 
individuals experience a stimulus overload. 


Encoding and short-term memory 


An examination of the errors of commission failed to support the suggestion that a 
visual-to-auditory encoding may occur between iconic storage and short-term memory. Were a 
visual-to-auditory encoding process to have been employed, one would expect a greater number 
of errors of commission to be of the acoustic confusability category. A comparison of the error 
data revealed that the retarded and non-retarded subjects were comparable for acoustic as well 
as visual errors. They did differ with respect to position and non-specific errors. 
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The higher p2rcentage of non-specific errors made by the retarded group suggests that there 
was no particu_ar encoding strategy employed by these subjects and their whole report 
performance was consistent with this suggestion. It is interesting to note that both the partial 
report and the whole report performance scores were comparable for the retarded subjects. This 
tends to support the earlier suggestion that a stimulus array consisting of ten items may be an 
overload situation for the retarded subjects. It must be borne in mind that the stimuli employed 
in the present study consisted of letters and the present argument based upon a limited capacity 
must be demonstrated to hold for non-verbal stimuli as well. 

The relatively lower percentage of non-specific errors and the higher percentage of position 
errors made by the non-retarded group suggests that the readout from iconic memory was 
sensitive to position shifts. Further, the higher percentage of position errors made by the 
non-retarded subjects is consistent with Sperling (1967) who argued that the operational mode of 
the readout from iconic to short-term memory may be parallel rather than serial. He noted that 
all of the items in all positions in a visual array had some probability of being reported correctly 
even after the shortest durations and that this would be unreasonable under the condition that 
the subject completes the processing of one item before he can report any information about a 
second item. The finding that a relatively higher percentage of errors of commission were of the 
position type for the non-retarded subjects indicates that these subjects were able to hold an 
even greater number of stimuli in short-term memory than their percentage correct recall scores 
would suggest. The non-retarded subjects were able to report accurately a mean of 3-33 letters 
per display in contrast to the retarded subjects who were able to accurately report only 1-34 
letters per display under the whole report procedure. 

The significant group x procedure interaction effect refutes Holding’s (1975) contention that the 
excess capacity of iconic memory, as indicated by partial report superiority in partial—-whole 
report comparisons, is due to the artifact of cue anticipation. Holding’s (1975) suggestion that the 
subjects can anticipate the cue position is highly improbable in the present study, since the bar 
marker appeared randomly four times for each of the ten letter positions over the 40 partial 
report experimental trials. For each of the three partial report experimental sessions the 
probability of chance responding was P= 0-025 for trial one increasing to P= 0-05 for trial 21, 
given that the subject was able to monitor efficiently the preceding 20 positions, an unlikely 
occurrence. 

The artifact o? output interference has also been suggested by Holding (1975) to account for 
partial report superiority in partial-whole report comparisons. Output interference can be 
assessed by comparing the order of report accuracy for the whole and partial report procedure 
as a function of serial position in the response sequence. Although such a comparison was not 
made in the present study it is likely that the present data would indicate that the items in the 
upper-left section of the stimulus array would be reported more accurately under the whole 
report relative to the partial report procedure in accord with Holding’s (1975) suggestion. 
However, such a finding could also be interpreted within the context of a metacontrast effect for 
items immediately adjacent to the left end item in the stimulus array leading to relatively poorer 
partial report pe-formance for these items. Dick (1971) points out that, although his data argue 
against selection occurring during iconic memory, it is not possible to employ the same data to 
argue against the presence of a fading memory trace as Holding (1975) has suggested. 

In summary, the present data suggest that the retarded individuals may be deficient in the 
control processes related to memory and that this deficiency most probably relates to the 
inaccessibility, cn the part of retarded subjects, of perceptual encoding strategies that enable 
them to process simultaneous input tasks efficiently. This deficit is demonstrated at the iconic 
memory level when verbal stimuli are employed and has yet to be demonstrated employing 
non-verbal stimuli. The inability to access perceptual encoding strategies in the context of 
supraspan verbal stimuli also leads to poorer short-term memory performance. The training in 
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and experience with efficient and successful strategies on the part of retarded individuals should 
promote an increase in the efficient processing of information. 
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A study of psychological well-being 


Peter Warr 


The concept of ps;-chological well-being 1s introduced, and scales to measure three of its different facets are 
described and app::ed to 1655 British respondents. Results from measures of positive and negative affect are 
compared with Nerth American findings, and hypotheses are broadly confirmed. Two clusters of specific 
anxiety items are identified, to do with financial and family anxiety and with health anxiety. The third 
measure (ratings of present life in general) yields a major cluster of happiness items, but suggests additional 
dimensions for mere detailed investigation. Interrelationships between the several measures and with 
employment position, motivation to work, job characteristics and age are examined. The study of everyday 
life as ‘normal psychology’ is advocated. 


The aim of this taper is to explore the measurement and some correlates of psychological 
well-being. This :s a somewhat malleable concept which is to do with people’s feelings about 
their everyday-le activities (e.g. Bradburn, 1969; Warr & Wall, 1975; Campbell, 1976). Such 
feelings may range from negative mental states (dissatisfaction, unhappiness, worry, etc.) 
through to a moze positive cutlook which extends beyond the mere absence of dissatisfaction (as 
health is something beyond the mere absence of illness) into a state which has sometimes been 
identified as pos.tive mental health (e.g. Jahoda, 1958; Herzberg, 1966; Berg, 1975). The 
definition of positive mental health is especially difficult, since the concept is both 
multidimensional and value-laden, but it is usually considered to include such features as 
favourable self-evaluation, growth and learning from new experience, a realistic freedom from 
constraints and some degree of personal success in valued pursuits. i 

Psychological well-being is thus a wide-ranging concept which embraces affective aspects of 
everyday experiznce. The operationalization of this concept is even more difficult than its 
description. Negative and bivalent components of well-being are relatively easily assessed 
through self-reperts of, for example, anxiety, happiness, job satisfaction or personal esteem, but 
the content and structure of feelings of those kinds are still sorely in need of exploration. The 
more positive aspects of well-being have been widely discussed (for instance by Maslow, 1973, 
and his follower=), but little progress has been made in their measurement. 

Increased understanding of ill-defined psychological concepts comes about through an iterative 
process of measurement and redefinition, so that a current need is for empirical data obtained 
through potentially useful measures, each looking at different facets of the overall concept. 
Three kinds of reasure are examined here. One taps reported anxiety about specific features of 
everyday life, a second takes ratings of life in general, and a third obtains material about 
positive and negative affect. 

These last components of psychological well-being were extensively studied by Bradburn 
(1969). On the bzsis of large-sample survey investigations in the United States he argued that 
positive and negative affect were uncorrelated: a person’s position on one of the two dimensions 
was not predicte le from his position on the other. Furthermore, the. two dimensions were seen 
to be related to uite different sets of variables. Positive affect was associated with higher levels 
of social contact and more exposure to new experiences, whereas negative affect was 
uncorrelated with these. On the other hand, negative affect was found to be associated with 
various indices af anxiety, fears of a nervous breakdown and physical symptoms of ill-health; 
but positive affext was not related to these. A respondent’s educational level was significantly 
associated with reported positive affect, but not with negative affect. Other North American 
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studies have been reported by Phillips (1967), Andrews & Withey (1974), Beiser (1974) and 
Cherlin & Reeder (1975). 

Bradburn’s view of well-being has some similarity with the two-factor theory of job 
satisfaction presented by Herzberg, Mausner & Snyderman (1959), Subsequent research has not 
supported their identification of two completely separate types of job satisfaction, and Bradburn 
does not argue that positive and negative affect are always differentially associated with other 
features. For example, indices of general happiness and life satisfaction were found to correlate 
to an equal extent (but naturally in opposite directions) with both positive and negative affect. 

Bradburn’s (1969) results may be summarized in terms of three categories of factors: 


Correlation with 
Bradburn’s measures 


Category Positive Negative Illustrations from 


of factor affect affect Bradburn’s data 

A No Yes Fear of nervous breakdown. Poor physical health 
B Yes Yes General happiness. Life satisfaction 

C Yes No Frequency of social contacts. Participation in new 


activities. Level of education 


The present study included a test of the replicability of Bradburn’s North American results on 
a British sample. It was also designed to investigate additional measures of facets of 
psychological well-being, their internal structure and their relationships with each other and 
some external variables. 


Procedure 
The sample 


The material presented in this paper was gathered during an interview lasting approximately 40 minutes. This 
usually took place in the respondent’s home and was conducted by an experienced interviewer of the 
Schlackman Research Organization. The interviews were part of a follow-up study of redundant steel 
workers six months after closure of their works on the outskirts of Manchester. Other aspects of this study 
are described by Warr & Lovatt (1977). 

The sample comprised 1655 respondents, representing 78 per cent of the redundant group. It was not 
possible to contact 13 per cent of the population, because they had moved house or were unavailable despite 
three calls. The remaining 9 per cent were contacted but not interviewed, because of illness, etc. (4 per 
cent), or refusal (5 per cent). Ninety-seven per cent of the sample were men, and the age distribution was as 
follows: under 25 years, 7 per cent; 25 to 34, 17 per cent; 35 to 44, 19 per cent; 45 to 54, 25 per cent; 55 years 
and above, 31 per cent. Approximately one-third (537) of the respondents were unskilled workers, 430 had 
previously held semi-skilled jobs, 462 were skilled workmen, 137, were from technical and professional 
grades, and 60 were clerks. 


The measures 


Positive and negative affect. Material of the kind studied by Bradburn (1969) was gathered through this form 
of spoken question: ‘During the last few weeks did you ever feel any of these things? — Pleased about having 
accomplished something? Particularly excited or interested in something? Bored? (and so on). Answers were 
either yes or no, given orally and recorded by the interviewer. Bradburn used ten items, but an additional . 
five were included in this study to examine the possibility of differing interrelationships arising from use of 
other items. In practice the internal structure of responses was the same with or without the new items, and 
Bradburn’s scales were therefore retained to allow comparability. (A table of means, variances and 
intercorrelations is obtainable from the author.) Total scores were derived for each respondent for positive 
and negative affect in terms of the frequency of yes answers to the five questions in each case. Note that a 
high negative affect score is on this basis an indicator of low well-being. Bradburn’s ten items were as 
follows: 
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Positive affect: Pleased about having accomplished something? That things were going your way? Proud 
because someone complimented you on something you had done? Particularly excited or interested in 
something? On tcp of the world? 

Negative affect: So restless that you couldn’t sit long in a chair? Bored? Depressed or very unhappy? Very 
lonely of remote from other people? Upset because someone criticized you? 

In administration the two sets of items were unsystematically mixed together. Note that the issues covered 


by the ten items are general ones which make no mention of work, redundancy or specific features of family 
or career. 


Anxiety levels. A 3econd measure was of reported anxiety about ten identified areas (not having enough 
money for day-to-day living, your family, your health, etc.). Respondents replied to these questions by 
indicating which of 11 scale points (from ‘not at all’ to ‘a great deal’) most fitted their level of worrying in 
the past few weeks. The response alternatives were shown on a card, and the respondent pointed to the 
most appropriate one. The interviewer recorded the answer as a score from 0 to 10. In addition, an overall 
anxiety report wa3 obtained after answers had been given about the ten separate issues. This was in terms of 
the question ‘In general, how much would you say you worry these days?’ The full set of items is 
summarized in Tables 2 and 3, identified as Al and A10 (specific anxieties) and A11 (general anxiety). 

Most of these items and the general mode of presentation had previously been used by the SSRC Survey 
Unit. Their content is such that they were thought likely to fall within category A in the summary presented 
above, being significantly correlated with Bradburn’s scale of negative affect but not associated with his 
measure of positive affect. 


Feelings about present life. Bradburn’s results suggested that the content of the third measure of well-being 
was more likely tc fall within category B, correlated with both scales. Respondents were asked to think 
about their present life, in general, and to describe it in terms of 11 adjectival scales. These ran, for 
example, from ‘bcring ’ to ‘interesting’ and were separated by seven boxes of semantic differential layout 
identified as rangiag from ‘extremely’ through ‘fairly’ to ‘neither’ at the midpoint. Answers were scored 
from 1 to 7 where 7 always referred to the positive pole, although the direction of the scales was varied in 
administration. The items were among those which had previously been employed by the Michigan Institute 
for Social Research (Andrews & Withey, 1974) and the SSRC Survey Unit (Hall & Ring, 1974), They are 
listed in Tables 2 and 3, identified as PLI to PL11. 


Table 1. Percentage of the present sample answering ‘yes’ to each of Bradburn’s items, with 
data from one B-itish and three North American studies 


SSRC Bradburn, Cherlin & 
Present sample, 1969 Phillips, Reeder, 
sample 1975 Wave 1 1967 1975 


Feeling-state item (UK) (UK) (USA) (USA) (USA) 

During the last few weeks did you ever feel... 

Positive feelings 

1. Pleased about having accomplished something? 68 60 78 80 75 

2. That things wer2 going your way? 60 60 64 72 73 

3 Proud because someone had complimented you 51 43 67 55 71 

on something you had done? : 

4. Particularly exc-ted or interested in something? 43 40 56 53 70 

5. On top of the world? 46 4] 29 31 39 

Negative feelings 

1. So restless that you couldn’t sit long in a chair? 31 24 48 27 34 

2. Bored? 31 28 38 26 33 

3. Depressed or very unhappy? 19 24 33° ° «19 29 

4. Very lonely or remote from other people? 11 18 27 18 30 

5. Upset because someone criticised you? 8 14 21 17 20 
Number of respondents 1655 932 2787 600 2086 


Percentage of women in sample 3 58 55 51 56 
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Results 
Positive and negative affect scores 


In view of the need for illustrative norms in this relatively unexplored area, Table 1} brings 
together the percentage of the present and other samples who responded ‘yes’ to each of the 
Bradburn items. The SSRC data are unpublished and were kindly provided by John Hall. 

It is difficult to make interpretative comparisons between the findings from the present study 
and those from elsewhere because of two possible confounding factors. The first is the 
differential sex composition of the samples; the final row in Table 1 shows that while the present 
sample was overwhelmingly male the others contained a majority of women. Bradburn (1969, 

p. 90) reported that women agree with significantly more negative well-being items than men, and 
Phillips (1967, p. 485) found that women agreed more with both negative and positive items. 

The second possible confounding factor is education level, which the North American studies 
have found to be significantly associated with positive well-being scores. The present sample of 
1655 contained only five people with degree-level qualifications and only 28 with Ordinary or 
Higher National Certificate; this is likely to represent a lower overall educational level than in 
the other studies. 


Correlations within the measures 


The internal structure of the positive and negative affect scales was found to be very similar to 
that described by Bradburn. The median point-biserial correlation between item and total score 
corrected for auto-correlation (Guilford & Fruchter, 1973, p. 455) was 0-47 for the positive 
scale and 0-48 for the negative scale. The average product-moment correlation within the five 
positive items (yes or no response only and therefore restricted variance) was +0-26, with a 
range from +0-16 to +0-40. Within the negative items this value was +0-24 (with a range from 
+0-13 to +0-42), but between the two sets of items the median value was —0-08 (ranging from 
+0-07 to —0-20). 

The total scores for positive and negative affect were however not quite as independent as 
Bradburn has found. The intercorrelation was —0-21. With the present sample size this 
correlation is of course statistically significant but it is sufficiently meagre to be of interest, and 
the two scales are treated separately in the analyses which follow. 

The ten specific anxiety reports in the second measure (A1 to A10) were all positively 
intercorrelated, with a range of values from +0-10 to +0-57. The pattern of correlations is 
shown in Table 2, which also includes the correlations between the specific anxiety scores and 
the general anxiety score (A11); these range from +0-19 to +0-57. Although there is clearly a 
general component in the ten specific scores, two main clusters of items may be distinguished. 
Hierarchical cluster analysis by the complete linkage method (applying a fusion criterion of 0-25) 
indicated one group of items (A1, A2, A5, A6, A10) covering financial and family anxiety and a 
second cluster (A4, A8, A9) concerned with health anxiety. Items A3 and A7 emerged as single 
outlying items. The items in the two principal clusters of specific anxiety items yielded average 
correlations with general anxiety (A11) of 0-48 and 0-41 respectively. 

The associations within the bipolar scales measuring feelings about present life (PL1 to PL11) 
were also strongly positive. A principal factor of happiness—unhappiness appeared to include all 
items except PL6 (hard-easy) and PL8 (controlled by others — under my control). Formal 
application of chister analysis techniques would be redundant here, since Table 2 reveals that the 
lowest correlation within the happiness cluster (0-44) is greater than the highest correlation 
between PL6 or PL8 and any of the other PL items (0-37). However, these two items are 
themselves relatively independent of each other (r= 0-27). 
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Correlations between the measures 


The positive and negative affect scores (PA and NA in Table 2) were found to intercorrelate with 
the other measu-es of well-being in the predicted manner. The anxiety items (Al to Al 1) were 
expected to fall within category A of the framework suggested in the Introduction, being 
associated with -egative affect but not with positive affect. Item A9 is very similar to one of 
Bradburn’s questions, and the correlations of -0-06 and +0-39 provide a striking confirmation of 
his results in a very different sample. This pattern is observed for items A2, A4, A5, A7, A8, A9 
and A10, wherezs Al, A3, A6 and A11 yield more equivocal results. However, in every case the 
correlation with negative affect is considerably larger than with positive affect. 

On the other Fand, the ratings of present life are equally strongly linked with both positive and 
negative affect. The mean correlation value for the nine items of the happiness cluster (i.e. 
excluding PL6 amd PL8) is 0-37 for both PA and NA. The average correlation between PA and 
NA and PL6 and PL8 is much lower (0-19). As expected, then, those aspects of well-being tapped 
by the happiness cluster fall into the second category described in the Introduction. 

It had been an-icipated that the two anxiety clusters would be differentially associated with the 
happiness cluster of the present life scales. This was not so: the average correlation with 
financial and family anxiety was —0-21, and with health anxiety it was —0-18. 


Correlates of well-being 


The several facets of well-being identified through the present measures may now be examined 
in relation to other factors. The psychological aspects of unemployment deserve systematic 
study, and some <vidence will be presented next. Following that, brief examinations of work 
attitudes and of age differences will be made. 

At the time of :nterview just over 50 per cent (891) of the redundant workers had located a 
job. The pattern cf differences between the employed and unemployed subsamples is shown in 
Table 3. The results are consistent for the two affect measures (PA and NA) and for the 
happiness cluster of the present life scales (i.e. excluding PL6 and PL8); the unemployed people 
report significantly lower well-being than those who have work. The same finding emerges for 
general anxiety (A11), and for four specific anxiety items. Differential anxiety about money (A1), 
health (A4) and jo> (A6) are accompanied by the unemployed’s greater worry about the world 
situation (A7). 

This general association between aspects of well-being and having a job may be more closely 
characterized thrc igh analyses in terms of the extent to which respondents value having work. 
Two groups were <reated for analysis on the basis of answers to questions about their desire to 
find a job rather tzan remain unemployed. The groups contained 920 and 278 members, omitting 
those with mid-rarge scores; see Warr & Lovatt (1977). The first subsample was identified as 
being of high ‘work orientation’ and the second as being of low ‘work orientation’. This factor 
might be expected to moderate the previously demonstrated relationship between employment 
position and aspec:s of well-being. 

Table 4 presents the relevant material about positive and negative affect measured by the 
Bradburn scales. Separate two-way analyses of variance have been carried out for each set of 
data. In both cases the main effect of employment was highly significant (this effect for the 
complete sample has already been described), but the overall difference between high and low 
work orientation was significant only for negative affect (1-21 against 0-57). It is the interaction 
term which is of major interest, and this was statistically significant at the 0-001 level in both 
cases. 

The pattern of szores in Table 4 makes the point that employment position is related to these 
aspects of well-beiag only for the high work orientation group: for these people employment 
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Table 3. Mean values of well-being for employed and unemployed respondents (n = 891, 764) 








Employed Unemployed Significance 
PA Positive affect 2:99 2:29 0-001 
NA Negative affect 0-80 1-25 0-001 
Al Not having enough money for everyday living 2-55 3-57 0-001 
A2 Your financial debts, such as HP, mortgage, etc. 1-35 1-47 n.s. 
A3 Relations with neighbours 0-32 0-34 n.s. 
A4 Your health 1-25 2.54 0-001 
A5 Your family 2-43 2-68 n.s. 
A6 Your job situation 2-49 4:38 0-001 
A7 The world situation 4-35 4-93 0-01 
A8 Growing old 1-25 1-57 n.s. 
AS That you might have a nervous breakdown 0-70 1-09 n.s. 
A10 That you might be made redundant 1-87 1-69 n.s. 
again in future 
All Worry in general 2-80 3-66 0-001 
PLI Boring - interesting 5-62 4-83 0-001 
PL2 Miserable - enjoyable 5-79 5-28 0-01 
PL3 Disappointing — rewarding 5-33 4-4] 0-001 
PL4 Empty -full 5-69 5-11 0-001 
PL5 Discouraging — hopeful 5-81 5-14 0-001 
PL6 Hard - easy 4-49 4-63 n.s. 
PL7 Frustrating — fulfilling 5-14 4-23 0-001 
PL8 Controlled by others —- under my control 5:16 5-20 n.s. 
PL9 Unsuccessful — successful 5:34 4-51 0-001 
PL10 Doesn’t give me a chance -- brings out the 5-04 4-18 0-001 
best in me 
PL1! Unhappy - happy 6:19 5-69 0-01 


Table 4. Differential salience as a moderator of the association between employment position and 
positive and negative affect 


Mean positive affect score Mean negative affect score 
High work Low work High work Low work 
orientation orientation Signifi- orientation orientation Signifi- 
group group Overall cance group group Overall cance 
People who have 3-10 2:72 3-05 0-01 0-88 0-55 0-83 0-01 
a job (573, 98) 
People without a 2:14 2-61 2:30 0-001 1-77 0-57 1:36 0-001 
job (347, 180) 
Overall 2-75 2-67 273 ns. 1-21 0-57 1:06 0-001 
(920, 278) 
Significance 0-001 n.s. 0-001 0-001 n.s. 0-001 


position is significantly associated with both positive and negative affect. For those who are less 
concerned about finding work (the low work orientation group), affect of both kinds is 
uninfluenced by employment position. 

This theme may be extended in terms of the subsample who were unemployed at the time of 
interview. Information was gathered about how seriously these respondents were looking for 
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work. A total of 245 unemployed people indicated that they were actively looking for a job, 
whereas 486 of the subsample reported that they were not really looking at the time of interview. 
It is to be expected that those who are unemployed but seeking work will experience lower 
well-being on bcth the positive and the negative affect scales than those who are unemployed but 
not looking for jobs. The mean positive affect values for the unemployed and seeking, 
unemployed and not seeking, and for the employed (as in Table 3) were found to be 2-01, 2-46 
and 2-99. Analysis of variance indicated a significant main effect (P< 0.001) and the comparisons 
between means ‘vere all significant at the 0.001 level. This pattern of means and statistical 
significance was repeated for the negative affect scale with scores of 1-50, 1-14 and 0-80 for the 
three groups respectively. 

Those membe-s of the sample who had found work were asked to make comparisons between 
their present job and their pre-redundancy job in terms of four characteristics: working 
conditions, job interests, the people in charge, and workmates. The affect measures have a quite 
different content from these job comparisons, but analysis of results from the employed sample 
showed that the latter three job comparisons were significantly related to both positive and 
negative affect at beyond that 0-001 level. Preference for the working conditions of the new job 
was associated with positive affect at this same level, but was not significantly related to 
negative affect. Employed respondents’ overall comparisons between their two jobs were also 
associated with both measures of affect (P< 0-001), as were reports of the degree to which their 
new job used their abilities (‘as fully as I would like’ through to ‘much less fully than I would 
like’). 

The tests of Bradburn’s model presented so far have been restricted to categories A and B of 
the framework described in the Introduction. The interviews also obtained information about 
whether or not the respondent’s wife was herself working. This would appear to fall into 
category C, being concerned with social contacts and the receipt of new information and ideas, 
although correlated factors such as increased family income cannot be ruled out. The average 
positive affect scores of the subsamples with and without a working wife were found to be 2-86 
and 2-39 respectively. This difference is statistically significant at the 0-01 level, but the values 
for negative affect were not significantly different at 1-02 and 1-05. This pattern of a category C 
relationship was retained in separate analyses of the unemployed and the employed respondents; 
in the former case a working wife implies separation during the day, but this is likely to occur 
for employed respondents whether or not their wife has a job. 

Studies of unemployed people (e.g. Daniel, 1974) regularly point to differences associated with 
age. For example, it is the middle-aged and older people who suffer most in economic terms and 
who find it hardest to return to employment (see also Warr & Lovatt, 1977). These differences 
are reflected in the further analyses by age which have been carried out on the material 
summarized in Table 3. The results of these analyses are bulky, and only brief mention of them 
will be made here; interested readers may obtain a more detailed account from the author. 

Two-way analyses of variance by age and employment position indicate many significant main 
effects (those for employment position are shown in Table 3) and interaction terms. Two points 
about the interactions are of particular interest. The largest difference in well-being associated 
with employment position is for the middle-aged group: members of the 45-54 age group who are 
unemployed tend to have particularly low well-being scores on the three types of measure 
employed here. The second point is that the youngest group (under 25 years) is an atypical one, 
in that the employed members of this group tend to have higher anxiety scores than the 
unemployed people of that age. It appears likely that a substantial number of under-25s in the 
sample are relatively carefree in their attitude to work and are content to remain unemployed. 
On the other hand the members of this group who have found work are likely to be concerned 
about family responsibilities and the possibility of another redundancy on a last-in-first-out basis. 
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General discussion 


The analyses described in this paper have explored several important facets of psychological 
well-being. Positive and negative affect, as defined by Bradburn (1969), have been shown to 
interrelate with other indices substantially as predicted from North American studies. Their 
independence from each other was however less than that reported previously. Two clusters of 
specific anxiety items have been identified, and further studies based on their separation would 
be worthwhile. The present life items have revealed a major factor of reported happiness, and 
the two items outside that factor deserve incorporation into more comprehensive scales. 

The several facets of well-being are conceptually and statistically distinct but overlapping: 
‘well-being’ is not the same as ‘happiness’, although the latter is a component of the former. 
External features are expected to have varying influence on different facets, as for example 
employment position was here seen to be consistently associated with happiness reports but less 
so with specific anxieties. A range of measures should be used to examine the influence of other 
major life events on different components of well-being. Such research is not without its 
methodological difficulties, and questions of reliability, validity and response biases deserve 
further investigation. An encouraging start has been made by Andrews (1974) and Andrews & 
Crandall (1976). 

The measures of affect developed by Bradburn have been altered and extended in several 
ways by other researchers. The original application (maintained in this study) was in terms of 
‘during the last few weeks did you ever feel any of these things?’ (yes or no), but Berkman 
(1971) inquired ‘how often do you feel each of these ways?’ (never, sometimes or often), and 
Beiser (1974) employed the three-point frequency response with a focus on ‘the past few 
months’. 

The independence of positive and negative affect appears to be retained, but it may yet prove 
to be an artifact of the questioning procedure or of the specific item content. The meaning of the 
affect scales has been queried by Cherlin & Reeder (1975), who raised the possibility that their 
focus is upon aspects of personal activation level rather than on emotional features to do with 
pleasantness. However, other studies (e.g. Berkman, 1971) have demonstrated a link between 
affect scores and physical health. 

These and other questions to do with the affective aspecis of everyday life experience have 
been somewhat neglected by British psychologists. There has long been an interest in ‘abnormal 
psychology’, dealing with the day-to-day experiences, activities, problems and coping processes 
of people defined as mentally ill. Further research into psychological well-being in its several 
aspects might form one strand of the developing study of the everyday life of those who are not 
ill. By juxtaposition, we might even call this ‘normal psychology’. 
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Prose and prejudice: Some effects of priming context on the immediate 
recall of information 


A. L. Wilkes and G. Alred 


Two experiments are reported in which recall of the same information is compared following different 
priming passages. In one case the subjects were primed by material that was consistent with the content of 
the main passage; in a second case, the priming introduced information in conflict with it. A further 
condition involving no priming passage was also investigated. It was found that inconsistent priming led to 
more accurate recall of the material involved in the conflict. Consideration of serial position, inference 
responses, reading times and neutral information supported the view that inconsistent priming, in particular, 
elicited deeper levels of analysis for the items of information taken as being relevant. 


The integration of information presented verbally is central to the process of meaningful 
learning, whether formally organized as in education or less formally as in maintaining working 
knowledge of a frequently changing subject area. Despite the importance of pooling information 
over time it cannot be claimed that in the past it has commanded an extensive experimental 
programme in the way that is true, for example, of serial learning. In consequence relatively 
little is known about the psychological demands of accumulating information even though it 
bears upon a basic human skill operative in many natural situations. The main reason for this 
state of affairs has been the late emergence of ‘knowledge’ as a subject of experimental 
concern. Thus Norman (1976) writes of ‘a whole new problem. . .the attempt to state in formal 
terms the representation of knowledge in memory’ (p. 173). The statement refers to events in 
1971 and although seriously undervaluing the contribution of Piaget and his co-workers (e.g. 
Inhelder & Piaget, 1958) and neglecting a substantial body of work within educational 
psychology, if the importance of compatibility with current models of memory is granted, the 
observation carries conviction. 

The materials used in memory experiments have progressed in recent years from serial lists 
through isolated sentences to prose passages, and each step has added to the psychological 
factors said to be operative in the learning task. Work on memory for sentences soon focused 
upon the role of structural variables (e.g. Johnson-Laird, 1974) and studies of memory for prose 
have highlighted inferential processes in addition to passage structure. Thus it has been reported 
that the recall of the content of a second story is facilitated if it shares a common plot structure 
with a prior story (Bower, 1976). Other studies, designed to diagnose the type of inferences 
drawn during the course of reading, led Bower to conclude that ‘understanding some sentences 
in a story requires the listener to build a backwards bridge of inferences to information provided 
by an earlier statement in the text’ (p. 533). 

Passage structure and inferential processing emerge as critical components of the learning 
situation. Similar conclusions concerning structural variables can be found in Gentner (1976), 
and, with respect to inferential processing, in the studies of Bransford and his colleagues on the 
nature of comprehension (Bransford & McCarrell, 1974). In accord with Norman’s comments, 
most of the published work is of recent origin and formal, if tentative, accounts of the 
theoretical structures and processes involved are now available (Schank, 1972; Minsky, 1975). 

Recall of passage content also varies with the learner’s perception of his task. Frederiksen 
(1975) reported that different experimental contexts influenced significantly ‘the amount of 
inferred and overgeneralized semantic information in subjects’ text recalls’ (p. 162), although not 
necessarily the total amounts of recall. The manipulation of context used entailed either 
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instructing subjects to process material simply for recall of content or to process the material for 
its application to a specific problem as well as recalling content. Frederiksen’s study is 
representative of one common approach to the experimental study of the effects of learning 
context on memory: that of externally controlling the learner’s processing activity. Other 
examples of this mode of investigation would be studies that set questions for the learner to 
answer (Rothkopf & Billington, 1975) or introduce additional operations such as note taking 
while learning (Carter & Van Matre, 1975). Generally, the variations in recall contingent upon 
such manipulations can be interpreted as reflecting extra attention, or deeper levels of analysis 
(Craik & Lockhart, 1972), directed at relevant parts of the text. 

Manipulations restricted to the internal properties of a passage define a second, 
complementary category of recall studies. While including the work on plot structure and 
inferential processing already cited, this category embraces a variety of units and techniques 
(see also Bruning, 1970; Johnson, 1971; Kintsch et al. 1975). Generally, variations in structural 
and semantic dimensions of text can also be expected to trigger chains of inferences in different 
ways and to influence how the learner identifies salient sections of a passage. The present study 
falls within this second category. It is suggested that differences in the recall of passage content 
will arise if during the course of integrating information, different bridging inferences are invoked 
by changing the relationship between earlier and later items of information 

Suppose the information to be learned and recalled relates to the same (fictional) individual 
over time. At one instance the information (A+) presents the individual in a favourable light. 
Later information may then refer to the same individual, continuing the favourable account (P+) 
or, as is not unlikely, early promise may not be fulfilled and the subsequent information may 
then be unfavourable (P—). Now suppose that the first body of information the learner 
experiences is either the favourable passage (P+) or the unfavourable (P—). The positive or 
negative information will set up characteristic and different priming contexts for subsequent 
material. If the passage about the earlier period (A+) is only now made available, different 
modes of integration can be expected as the two bodies of information are brought together and 
bridging inferences are drawn. The empirical question relates to the manner in which recall of 
(A+) is influenced by the priming context encountered. 

Of the two integration sequences it might be argued that the consistent pattern, ((P+) (A+)), 
will be associated with higher recall since the passages should be bridged easily so facilitating the 
assimilation of the relevant content. A similar argument applied to the inconsistent sequence 
would predict impairment of recall. An alternative prediction, however, is that experience of the 
inconsistent sequence, ((P—) (A+)), will give rise to temporary problems in assimilation leading 
to more intensive inferential activity and deeper levels of processing for the salient information. 
The following experiments were designed to clarify this position. 


Experiment I 
Method 


Materials. Three passages of prose were prepared; two of these provided different priming contexts and the 
third served as the main passage. The priming passages (Fig. 1) gave a description of the educational 
performance of an individual (John) during late secondary school and higher education. Of the two versions, 
one was favourable, P(+), describing John as conscientious, well motivated and successful, and one was 
unfavourable, P(—), which gave the converse picture. The two versions were of equivalent length (99 and 
100 words) and of similar grammatical construction, differing only in specific descriptive phrases or clauses. 

The main passage, A+, (Fig. 2) was 167 words long and provided information on two main themes: firstly, 
John’s educational performance at primary school and secondary school up to the age of 16 which was 
generally favourable, and secondly, his social behaviour during that period which was described in mildly 
unfavourable terms. These two areas of information are referred to as the educational data base, D,, and the 
social data base, D}. The passage consisted of three chronological segments or sections: (1) primary school, 
(2) transfer to secondary school, and (3) ‘O’ levels. In each segment, two items of information from each 
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data base were given. Thus, the two data bases, D, and D,, were interspaced throughout the passage with D, 
preceding D, in each section. The passage also contained information which was neutral with respect to the 
data bases, Two such items came at the beginning and four at the-end of the passage. Also one neutral item 
of information preceded each segment of D, and D+, giving a total of 12 neutral items. 


P(+) 

John, in the final part of his secondary schooling, showed a keen interest in his work and he approached 
the preparation for ‘highers' with a strong desire to do well. He gained a school award and this strengthened 
his commitment even more. His ‘highers’ results were very good. In 1975 he applied for University entrance 
and was immediately accepted by his first choice. The most recent reports of his progress that are available 
are extremely favourable. His tutors have commented on his scholarly showing in tutonals and have 
predicted, on current form, that he will take a very good degree. 


P(-) 

John, in the final part of his secondary schooling, could not settle down to work and he approached the 
preparation for ‘highers’ lacking any strong desire to do well. He gained no school awards and this weakened 
his commitment even more. His ‘highers’ results were borderline. In 1975 he applied for University entrance 
and after a long delay gained a late acceptance. The most recent reports of is progress that are available are 
extremely unfavourable. His tutors have commented on his weak showing in tutomals and have predicted, on 
current form, that he will take a very poor degree. 





Figure 1. Experiment I: Priming passages. P{+), favourable; P(—), unfavourable. 














John was born in 1957 and was brought up in a rural area. He entered primary school in 1962 and the 
school records indicate that he was above average in class and established a favourable impression with his 
teachers. The comments on his social relationships at 5 indicate a limited set of friends and that he stood 
aside. John transferred to secondary school in 1968. Once there he was assigned to the top stream and his 
reports present him as a rapid learner. His social relationships at 11 were restricted in their range. It was 
noted that he was not popular in class. John sat ‘O’ levels in 1973. It had been thought that he would 
perform well and, at the end of the year, he passed in all subjects. His social behaviour at 16 indicated a 
reserve in dealing with others and few close relationships. His parents are still living and are at the same 
address. John, himself, 1s now 21 and lives away from home. 


Figure 2. Experiment I: Main passage (material in italics deleted for recall task). 


Procedure, Two groups of ten subjects were used. Subjects were randomly assigned to one of two conditions 
and read either the P(+) or P(—) passage followed by the main passage. Testing was conducted individually. 
Each subject was told that he was to read and recall the content of simple prose passages that formed a 
summary of one individual’s educational history. He was then allowed to read the priming passage through 
once. Immediately afterwards he was asked for free recall of the passage content. Following this he was 
instructed to read and memorize the main passage and when this had been completed, a form of prompted 
recall was employed. In this case a skeletal version of the main passage was prepared with neutral and data 
base items deleted. The material deleted is shown in italics in Fig. 2. During recall the subject wrote down 
the appropriate items in the relevant blanks. Although subjects were told to read each passage only once, no 
other restriction on reading time was imposed and the times taken by each subject were recorded. 


Subjects. These were 20 first-year undergraduate students at the University of Dundee. 


Results and discussion 


Recall analysis: Scoring procedures. Each subject’s written recall of the items in the main 
passage was analysed in terms of four categories: 

1. Accurate - An item was accurately recalled or an equivalent paraphrase given. 

2. Omission ~ An item was not included in the appropriate segment. 

3. Intrusion - An item was reproduced in the wrong segment. 

4. Inference - An item was replaced by information not specifically supplied. For inference 
judgements, three subcategories were used to differentiate the intensity of responses. Cases 
where the subject intensified, Inf.(+); attenuated, Inf.(—); or left unchanged, Inf.(=), the 
strength of the original statement, were thus distinguished. 
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The two authors analysed the recall protocols independently. Disputed classifications were 
noted and resolved jointly. The inference category, which covered all major departures from the 
original, was the most difficult to use. For example, difficulties could arise when having to decide 
if an inference was based on the item it replaced or on the data base as a whole. For instances 
of the latter case, the passage item was scored for omission as well as inference, and the level of 
intensity of the inference was judged relative to the whole data base. In general, a conservative 
policy was adopted: disagreements over inference categorizations, which involved level of 
intensity, were usually resolved by recording an Inf.(=) response. Taking the categories 
separately, of the 126 items classified as accurate, 14 were initially disputed. Of the 100 judged 
cases of omission, 9 were in dispute. In the case of inferences, the proportion of disputes was 8 
out of 50. Intrusions were relatively infrequent and disagreements were involved in 3 out of a 
total of 28 judgements. Overall the percentage agreement between judges was 89 per cent. 


Accuracy in recall of main passage (A+). In accuracy of recall of neutral items the two 
treatment groups were similar ((P+), 10-1; (P—), 9-9). The mean levels of recall for the 
educational (D,) and social (D,) data for each group are given in Fig. 3. 
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Figure 3. Experiment I: Mean accuracy scores for the recall of educational (D,) and social (D,) information. 
P(+) and P(-). 


While the groups are equivalent in terms of their total recall, the relative contributions of the 
two data bases are not the same. An analysis of variance with priming context as a 
between-group factor and data base as a within-group factor indicated that priming context was 
not significant (F= 0-037, d.f. = 1, 18, P> 0-1). The main factor of data base was significant 
(F = 24-69, d.f. =1, 18, P< 0-001) and the interaction of priming context with data base was also 
significant (F= 13-28, d.f.=1, 18, P< 0-01). 

Thus in both groups the items in D, were better recalled than the items in D,. However under 
the P(—) priming condition the superiority of recall of educational items compared with the 
social items was most marked. It would seem therefore that the effect of inconsistent priming 
has been to increase the salience of the educational material (D,) at the expense of the social 
material (D,). The effect of the consistent context, P(+), has been a closer balance in the relative 
salience of both sets of information. While it is clear that the educational material is basically 
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more memorable than the social material, inconsistent priming has served to accentuate this 
difference. The greater memorability of the items of D; in both groups is most likely to be 
related to the manner in which the passages were initially presented, that is, as summaries of 
educational performance. The introductory statement together with the exclusively educational 
content of the priming passage and the primacy of the educational statements in the main 
passage could be expected to boost the salience of D, in both groups. Nonetheless, within this 
framework the inconsistent priming leads to a much greater imbalance between the accuracy for 
educational and social recall. 

Reference to omission scores indicates that the poorer performance for D, under the P(—) 
condition arises mainly from this source. (P(—), 3-7 vs. P(+), 2-9). Although the inference scores 
are roughly equivalent for the two experimental treatments (P(—), 2-2 vs. P(+), 2-8), the distribution 
of Inf.(+) scores is not equivalent. Under condition P(+) there was a total of four inferences 
judged as intensifying the original content of D, and seven intensifying the original content of 
D+. The equivalent scores for P(—) were 1, (D,) and 12, (D,), suggesting that subjects 
experiencing the inconsistent priming were more likely to replace the items of the social data 
base with inferences that made the original, somewhat negative comments much more extreme. 
For example one subject replaced ‘Not popular in class’ with ‘Generally disliked by the class’. 
Another replaced ‘Reserve in dealing with others’ with ‘Anti-social’, 

Analysis of the recall protocols suggested that the items in the social data base were in certain 
cases less distinctive than the parallel items in the educational data base. While this remained 
true for both experimental groups, and hence could not be the primary source of the interactions 
between priming context and data base, it was, nonetheless, decided to rewrite part of the main 
passage and to test the generality of the findings by a further study. The results of this 
replication are reported under Expt. H. 


Experiment I 
Method 


Materials. The main passage was modified to increase the distinctiveness of individual items, particularly 
those in the social data base. For example, the similarity of ‘limited set of friends’ and ‘few close 
relationships’ was removed by replacing the latter with ‘unease in large groups’. It was felt that such 
changes did not alter the general nature of the passage (see Fig. 4). Three neutral items which had been 
given in the main passage skeleton in Expt. I were not given in the skeleton in Expt. II and hence became 
recall items. These were ‘primary school’, ‘secondary school’, and “O” levels’. There were thus 15 neutral 
items in Expt. II. The priming passages remained unchanged. 





John was born in 1957 and was brought up in a rural area. He entered primary school in 1962 and the 
school records indicate that he was above average in class and was seen by his teachers as an alert child. The 
comments on his social relationships at 5 indicate a limited set of friends and that he preferred to be on his 
own. John transferred to secondary school in 1968. Once there his teachers decided that he should be 
assigned to the top stream and his reports present him as a rapid learner. His social skills at 11 were 
underdeveloped for his age and it was noted that he was not popular in class. John sat ‘O’ levels in 1973. It 
had been thought that he would perform well and, at the end of the year, he passed in all subjects. His social 
behaviour at 16 indicated a reserve in dealing with others and unease in large groups. His parents are still 
living and are at the same address. John, himself, is now 21 and lives away from home. 











Figure 4. Experiment I: Main passage (material in italics deleted for recall task). 


Procedure. Two groups, each of ten subjects, were tested under the conditions P(+) and P(—), as in Expt. I, 
and a further group of ten subjects was tested under a third condition P(0) in which no priming passage was 
provided. These subjects were tested only for recall of the main passage. 


Subjects. These were 30 first-year undergraduates at the University of Dundee. 
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Results and discussion 


Recall analysis: Scoring procedure. P(+); P(—). Scoring followed the same procedure as that 

- used in Expt. I except that one judge scored the recall protocols blind, i.e. without knowing 
from which group any particular subject came. Taking the categories separately, of the 143 items 
classified as accurate, 8 were initially disputed. Of the 195 judged cases of omission, 8 were in 
dispute. In the case of the inference category, the proportion of disputes was 7 out of 76, and 
for intrusions, 3 out of 34. Overall the percentage agreement between judges was 94 per cent. 


Accuracy in recall of main passage. P(+) vs. P(—). The comparable data for Expt. I are given in 
Fig. 5 which reproduces the accuracy scores for the revised passage. 


Mean accuracy score 





D, D: 
Data base 


Figure 5. Experiment II: Mean accuracy scores for the recall of educational (D,) and social (D,) information. 
P(+), P(—) and P(0). 


The analysis of variance indicated that the main factor of priming context was not significant 
(F=2-23, d.f.=1, 18, P> 0-1), but the main factor of data base was significant (F= 9-19, 
d.f.=1, 18, P< 0-01). It would seem that despite the alterations to the passage the educational 
information was still more salient than the social information. Comparison of Figs 3 and 5 
indicates a similar pattern of interaction of data base with priming context although for the 
revised passage the effect has been attenuated, with the interaction term on the borderline of 
significance (F= 4-085, d.f. = 1, 18, 0-06 > P> 0-05). The greater accuracy in recall of D, items 
under condition P(—) is still present but the groups have moved closer together in their degree of 
accuracy in recall of D,. 


Serial position in passage. Groups P(+) and P(—). Within the main passage each data base can 
be chronologically sectioned into information relating to primary school (Age 5), secondary 
school (Age 11) and ‘O’ level (Age 16). The accuracy scores allowed for the interchange of the 
two items from D, or D, within each segment but not between segments. Figures 6 and 7 
indicate the levels of recall for the two data bases over each section of the record. 

Within group P(+), analysis of variance with data base and serial position as two repeated 
measures indicated that neither of the two main factors, nor the interaction reached significance: 
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Figure 6. Experiment II: Mean accuracy scores for the recall of educational (D,) and social (D,) information 
as a function of serial position. P(+). 
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Figure 7. Experiment H: Mean accuracy scores for the recall of educational (D,) and social (D,) information 
as a function of serial position. P(—). 


data base (F< 1, d.f. = 1, 9); serial position (F< 1, d.f. =2, 18); interaction (F= 1-81, d.f. =2, 
18, P> 0-1). 

Within group P(—) the data bases were significantly different (F = 30-00, d.f. = 1, 9, P< 0-001) 
but serial position was not significant (F< 1, d.f. =2, 18). The interaction of data base with 
serial position was also significant (F= 7-51, d.f. =2, 18, P< 0-01). 

In sum, under the P(+) treatment condition the data bases are recalled at equivalent levels 
regardless of their position within the passage. In the P(—) condition, however, the items from 
D, are more likely to be recalled from the later sections of the passage but this is not true for D, 
items. 


Recall of main passage without specific priming. Group P(O). In the third condition, P(0), 


subjects were given the main passage and then asked to recall its content immediately. At one 
level, therefore, P(0) serves as a base-line against which the two priming groups can be 
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compared. However, it should be noted that it is extremely unlikely that subjects in P(0) read 
the passage with no a priori views about its theme - a more appropriate description for this 
treatment condition would refer to the degree of priming introduced rather than the absence of 
prior context. Levels of recall are indicated in Fig. 8. 
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Figure 8. Experiment Il: Mean accuracy scores for the recall of educational (D,) and social (D,) information 
as a function of serial position. P(0). 


The overall recall level for both data bases in P(0) was 3-3. This compares with 4-8, P(+) and 
6-2 P(—); a significant difference between P(0) and P(—) (t= 4-16, d.f. = 18, P< 0-001) but not 
between P(0) and P(+) (t= 1-43, d.f. = 18, 0-1 > P> 0-05). An analysis of variance of accuracy 
scores as a function of serial position indicated that the only significant factor was D, vs. D; 
(F= 13-96, d.f. = 1, 9, P< 0-01). The main factor of serial position was not significant (F= 2-84, 
d.f. =2, 18, 0-1 > P> 0-05) nor was the interaction term F=2-13, d.f.=2, 18, P>0-1). On 
relative accuracy, therefore, group P(0) was closest to the priming treatment P(—}, in that the 
educational material was recalled more frequently than the social material although at a distinctly 
lower level. With respect to serial position, however, P(Q) was similar to P(+). In sum, items 
from D, were more memorable than D, items under circumstances that either introduced 
additional material in conflict with the original, or, under circumstances that allowed D, to 
feature as the primary theme of the passage. The introduction of priming material that could be 
seen as consistently related to D,, on the other hand, was associated with a more balanced 
pattern of recall between the two data bases. 

Mean accuracy scores for neutral passage material were: P(+), 11-4; P(—), 11-5 and P(0), 9-8. 
There were no treatment differences in respect of this measure. The comparability of the 
accuracy scores for neutral information, particularly for the two priming conditions, throws into 
relief the differential recall within data bases previously reported. Overall, it seems that subjects 
in P(0) engaged in more superficial processing of the passage content than subjects in either P(+) 
or P(—). Additional support for this view comes from an examination of reading times for the 
main passage. In Expt. II, reading times under P(0) were less than under P(+) and P(—); 53-9 sec 
vs. 77-5 and 80-4 sec, respectively. (U = 17 and 19-5; P< 0-05 in both cases.) Reading times for 
P(+) and P(—) did not differ. 


Inferences. Groups P(+), P(—), and P(Q). Inference data are given in Table 1. Subjects in P(0) 
produced relatively few inferences again suggesting a low level of processing in the absence of 
a priming context. A significant difference exists in comparison to P(—) (U = 17, P< 0-05) but 
not in comparison to P(+) (U=27, P> 0-1). 


Priming context and immediate recall 131 


It can be seen from Table 1 that the distribution of inference (=) was roughly the same in both 
priming contexts and both data bases. The distribution of inference (+), on the other hand, 
indicates an interaction pattern resembling that found for the accuracy data, except that the 
degree of intensification was focused in Dx, the social base. Similar results were found in Expt. 
I. Within the (P—) condition there was a greater number of inference responses to D; items than 
to D, items (t= 2-57, d.f. =9, P<0-05). This was not found in the other conditions and indicates 
that inconsistent priming not only focuses attention on D,, but also influences the manner as well 
as the degree of assimilation of D}. 


Table 1. Total inference responses. Expt. II. P(+); P(—) and P(0) 








Inf” Inf- Inf* Total 

D; D, D, D: D, D: D, D, 
P(+) 8 5 1 3 6 5 15 13 
P(-) 9 10 0 3 2 10 11 23 
P(0) 4 5 3 1 0 1 7 7 





Omissions. Mean omission scores for D, and D, were 3-3 and 3-4 (P+) compared with 1-6 and 
3-0, (P—). In the case of P(0) the scores were respectively 3-4 and 4-8. As one would expect, the 
omission scores give roughly the inverse of the accuracy scores, i.e. absence of accurate recall 
usually implies an omission. 


General discussion 


It is proposed that under condition P(—), reading the priming paragraph and its negative 
educational content gives rise to a conceptual structure in the reader from which other 

features of the central character’s educational performance can be inferred. It is not thought that 
at this stage any expectations are clearly formulated, rather they are potentially available, to be 
called into play if and when new information is assimilated that does not fit within its inference 
range. Thus, on reading A(+), these subjects experience educational statements that were not 
predictable from the earlier context and, as such, a temporary problem in assimilation is created. 
The findings consistent with this interpretation relate to the superior recall of D, compared with 
D, (Expts I and II); the increasing separation of the data bases as more of the main passage is 
processed (Expt. II) and the enhanced level of inference responses when reproducing the social 
items (Expts I and II). As more educational items in the main passage are read the inconsistency 
becomes increasingly evident, resulting in additional attention being paid to this source of 
information. At recall the subject is beginning to remove the inconsistency by sharpening and 
intensifying the social commentary. If John is converted from a somewhat shy boy to a child 
with social problems it is less implausible that he should do badly in higher education, having 
done so well beforehand. 

For the subjects in group P(0) the level of recall for both data bases was generally below that 
found for the priming conditions. Inference totals and reading times also tended to be less. This 
result is consistent with the view that the absence of a priming passage and the need for its 
integration with the information of A(+) led to relatively superficial processing. However, these 
subjects did attribute an educational theme to the main passage, in response to the initial 
instructions and the structural properties of the passage itself. 

In the group P(+) the situation is less clear. The content of the educational statements in the 
main passage was not expected to violate inferences that could be drawn from the priming items, 
and it was found that both D, and D, items were reproduced at equivalent levels of accuracy. In 
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this respect the educational material was not given the extra priority it commanded in the other 
experimental conditions. Before concluding that the difference between P(+) and P(—) was 
simply due to the factor of conflict, the possibility of proactive interference cannot be entirely 
ruled out. Nonetheless, there is no measure on which P(+) fared worse than P(0). Usually this 
treatment occupied an intermediate position between the no priming condition and that designed 
to introduce conflict. 

Considered in the light of the possible outcomes discussed earlier there is no support for the 
prediction based upon facilitatory effects arising from the integration of similar material. In 
contrast it was found that given conflict, the educational items of A(+), which were at variance 
with the priming items, were remembered relatively clearly. This outcome is thought to have 
arisen because the sequence ((P—) (A+)) invokes in the reader a search for plausible bridging 
inferences. This would be consistent with other work on prose recall which has demonstrated 
the importance of inferential activity even in simple stories (Bower, 1976) and that such activity 
influences the qualitative properties of recall (Frederiksen, 1975). The present experiments have 
not demonstrated that conflict has led to explicit inferential chains linking priming and main 
passage items, rather given the different patterns of recall for P(+) and P(—) the difference could 
plausibly be attributed to the reader making some efforts in this direction. This would certainly 
account for the distribution of inferences reported. It was noted that the frequency of inference 
responses tended to increase under priming and for the P(—) treatment, changes in the recall of 
social items frequently took the form of plausible reasons for John performing badly in later 
years. The responses categorized as inferences have not been restricted to purely logical 
responses that might be deduced from earlier information. Given the knowledge that John is 
likely to fail his course at the age of 21, it is neither logical nor necessarily sensible that reasons 
be sought in his social behaviour at the age of 11 or 16. It does, however, introduce some degree 
of coherence into the limited information available. Although the present definition of inference 
is very broad it is similar to what Schank (1972) refers to as predictions based upon a ‘world 
view’. ‘[It] is possible to make contextual predictions [when interpreting language] as to the 
content of expected conceptualizations. . . based on a belief system that includes generalized 
rules for operating in the world based upon one’s view of people’ (p. 626). Similarly Thorndyke 
(1976) argues that prose comprehension involves the assimilation of individual sentences ‘into a 
larger framework incorporating implicit, causal, temporal and motivational information’ (p. 444). 
The extent to which such inferential processing is an inevitable component of integration given 
the presence of conflict is unclear. It is likely that the particular passages used in the present 
experiments are ‘atypical’; certainly other material may serve to attenuate the effect. But this is 
not a simple issue of generating sufficient ‘random’ samples of prose (if this has any meaning at 
all), rather general conclusions will have to await more evidence on how passages of prose are 
integrated in varying contexts and when sharing different relationships. 


Acknowledgements 
This work is supported by the Social Science Research Council. 


References 


Bowen, G. H. (1976). Experiments on story 
understanding and recall. Q. JI exp. Psychol 28, 
511-534. 

BRANSFORD, J. D. & McCARRELL, N. (1974). A 
sketch of a cognitive approach to comprehension: 
Some thoughts about understanding what it means 
to comprehend. In W. B. Weimer & D. S. Palermo 
(eds), Cognition and the Symbolic Processes. New 
Jersey: Lawrence Erlbaum. 

BRUNING, R. H. (1970). Short term retention of 


specific factual information in prose contexts of 
varying organization and relevance. J. educ. 
Psychol. 61, 186-192. 

CARTER, J. F. & VAN Matre, N. H. (1975). Note 
taking versus note having. J. educ. Psychol 67, 
900-904. 

Craik, F. I. M..& Locxnarr, R. S. (1972). Levels 
of processing: A framework for memory research. 
J. verb. Learn, verb. Behay. 11, 671-684. 


FREDERIKSEN, C. H. (1975). Effects of context 
induced processing operations on semantic 
information acquired from discourse. Cog. 
Psychol. 7, 139-166. 

GENTNER, D. R. (1976). The structure and recall of 
narrative prose. J. verb. Learn. verb. Behav. 15, 
411-418. 

INHELDER, B. & Pracer, J. (1958). The Growth of 
Logical Thinking from Childhood to Adolescence. 
New York: Basic Books. 

JOHNSON, R. E. (1970). Recall of prose as a function 
of the structural importance of the linguistic units. 
J. verb, Learn. verb. Behav. 9, 12-20. 

JouNnson-Lairb, P. N. (1974). Experimental 
psycholinguistics. Ann. Rev. Psychol. 25, 135-160. 

KINTSCH, W. (1974) The Representation of Meaning 
in Memory. New Jersey: Lawrence Erlbaum. 

KINTSCH, W., KozmiInsky, E., STREBY, W.J., 
McKoon, G. & KERNAN, J. M. (1975). 
Comprehension and recall of text as a function of 


Priming context and immediate recall 133 


content variables. J. verb. Learn. verb. Behav. 14, 
196-214. 

Minsky, M. (1975). A framework for representing 
knowledge. In P. Winston (ed.), The Psychology of 
Computer Vision. New York: McGraw-Hill 

NORMAN, D. A. (1976). Memory and Attention. An 
Introduction to Human Information Processing. 
New York: Wiley. 

Rotukopr, E. Z. & BILLINGTON, M J. (1975). A 
two factor model of the effect of goal descriptive 
directions on learning from text. J. educ. Psychol. 
67, 692-704. 

ScHANK, R. (1972). Conceptual dependency: A 
theory of natural language understanding. Cog. 
Psychol. 3, 552-631. 

THORNDYKE, P. W. (1976). The role of inferences in 
discourse comprehension. J. verb. Learn. verb. 
Behav. 15, 437-446. 


Received 5 July 1976; revised version received 2 February 1977 


Requests for reprints should be addressed to Dr A. L Wilkes, Department of Psychology, University of 


Dundee, Dundee, Angus, Scotland. 


Br. J Psychol. (1978), 69, 135-137 Printed in Great Britain 135 
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Child Development. By Geoffrey Brown 
The Soclal Context of Teaching. By Gerald Cortis 
Personality and Education. By David Fontana 
Learning and Behaviour Difficulties in School. By D. J. Leach & E. C. Raybould 
School Learning: Mechanisms and Processes. By R. J. Riding 
Series title. Psychology and Education; General Editor: Gerald Cortis 
London: Open Books. 1977. Pp. viiit170. £1.95 each. 


These five books form a set published under the general heading of Psychology and Education, edited by 
Gerald Cortis. They are directed at teachers, in order to help them with the sort of problems of learning and 
behaviour with which they will have to deal in the classroom The authors are lecturers in education or 
educational psychology or educational psychologists. 

It is not easy to say what would be the most useful order in which to read the books, This might depend 
on how much the reader already knows about genéral psychology. Occasionally, the authors seem to assume 
that the reader knows very little about the subject; for example, two of the authors define a correlation 
coefficient in terms even simpler than most introductory texts in psychology would use, but another author 
goes into detailed description of experiments on short- and long-term memory which would be difficult to 
grasp without some previous knowledge. The reviewer read them in a quasi-random order, choosing the one 
which happened to be on the top of the pile, and thus succeeded in leaving the liveliest and most informative 
one to the last, i.e. Learning and Behaviour Difficulties in School. There is considerable overlap ın the topics 
dealt with in all the books. Four of them have chapters with ‘Learning’ in the titles, and all the books have a 
preference for operant learning on the Skinnerian model, and for introversion/extraversion and neutoticism 
as personality characteristics of major importance. 

Personality and Education by David Fontana was read first. He begins with a chapter on personality 
determinants ~ heredity and environment - dealing very briefly with chromosomes and chromosomal 
abnormalities, and giving some of the evidence from longitudinal studies which seems to show that general 
personality or temperamental characteristics shown in infancy may persist into later life; e.g. infants who are 
‘awkward’ or ‘easy’ to deal with seem to be awkward or easy when they are of school age. He deals with 
environment equally briefly, distinguishing between middle and working class upbringing, the effect of 
‘warm’ or ‘cold’ mothering and of deprivation. The same topics, however, are dealt with more fully and 
clearly by Geoffrey Brown in Child Development. But both Fontana and Brown put more emphasis on the 
severer genetic abnormalities than seems necessary, without indicating how frequently these are likely to be 
seen in the ordinary classroom. Both also quote MZ and DZ twin studies when discussing heredity, with no 
reference to the difficulties ansing from Kamin’s criticisms of the major twin studies. 

Fontana outlines the major theories of personality, classifying them as psychoanalytic (Freud), humanistic 
(Maslow, Kelly), and nomothetic (Eysenck & R. B. Cattell) and gives a clear account of each, followed by a 
short description of the kinds of tests used, and a short critical assessment. The level seems about right for 
readers new to these topics This is followed by a section on personality dimensions and educational 
achievement and presents the rather inconclusive results of investigations aiming to show the relationship 
between introversion, extraversion, anxiety, and educational attainment. ‘Learning theory and personality’ 
covers classical conditioning (illustrated as always by the unfortunate Albert) operant conditioning, and 
social learning (Bandura). All of these topics are dealt with more fully by Riding, and to some extent by 
Leach & Raybould. Fontana goes on to give a very brief account of how the principles of operant 
conditioning (behaviour modification) may be used by a teacher in the classroom, but this seems simplified to 
the extent of being misleading. He does point out that the technique needs to be used with ‘skill and 
patience and self-control’, but this 1s hardly sufficient guidance. Leach & Raybould are much more thorough 
in their handling of this topic. 

Fontana goes on to deal with personality and cognition, distinguishing between IQ and creativity, and 
outlining personality variables such as field-dependence -independence, focusing versus scanning strategies, 
reflectivity versus impulsivity (conceptual tempo) and suggesting that these may be important dimensions of 
children’s ways of learning and of solving problems which the teacher could take account of when dealing 
with children. I am not sure that he has done anything more with his very brief descriptions of some of the 
research on these topics than to supply some impressive jargon which may mislead teachers into thinking 
that they understand the children. His last chapter on mental ill-health does this to an even greater extent, by 
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describing hyperactivity, psychopathy and mental illnesses with no indication that these are relatively 
infrequent in primary school children. 

Summarizing, though Fontana gives a clear and lively account of personality theories, he is less than 
adequate on learning theories and could be misleading in his account of how some personality variables may 
affect a child’s school attainments. By supplying labels which can all too easily be attached to an individual, 
he may convey an impression that to label is the same as to understand. 

Geoffrey Brown in Child Development covers much the same topics as Fontana, rather more fully on 
heriditary mechanisms. He deals in more detail with cognitive and perceptual development. He outlines the 
work of Vigotski, Bruner and Piaget, giving a fairly full account of Piaget’s developmental stages. He deals 
also with the development of language and with the theories of Braine, Brown & Bellugi and Chomsky about 
language acquisition, as well as the Skinnerian theory. The exposition of phrase structure and generative 
linguistics in two or three pages is too brief and too technical to be easily understood by a reader with little 
background knowledge of the subject. 

He reports also Bernstein’s work on social class and language and accepts too readily the commonly held 
version that working class children cannot use the elaborated code that middle class children learn so easily. 
Though he does refer to the criticisms of this by Lawton, Labov and others, he does not mention the paper 
by Bernstein himself regretting the current interpretation of his work, and pointing out that working class 
children need not ‘compensatory education’ but simply ‘education’ (Bernstein, 1971). Gerald Cortis, in The 
Social Context of Education gives the same kind of account of Bernstein’s work, without mentioning 
Bernstein’s disclaimer. This seems unfortunate, ıf ıt confirms the widely accepted opinion that the ‘working 
class’ child is inevitably limited in his intellectual development by the linguistic handicaps of his social class. 

Riding’s book on Social Learning: Mechanisms and Processes is pretty heavy going. He relies greatly on 
experimental work on short-term and long-term memory I am not sure that his accounts of, say, Peterson’s 
experiments, will convince teachers of the relevance of these experiments to classroom learning. The chapter 
on types of learning is, however, more useful and gives a clearer account of learning theories than the other 
books do. He covers cognitive information, strategies, motor skills and social skills, as well as conditioning. 
Still, none of it is very easy to follow. Perhaps the following quotation may illustrate the tone of his writing: 
‘problem solving is monitored by STM acting in an executive capacity. Information about the problem is 
assessed by STM, which then initiates searches of LTM for an appropriate plan. ..After each attempt at 
solving the problem, STM evaluates the result’ (p. 95). The child, trying to solve a problem, seems to have 
been taken over by a new homunculus. 

In a final chapter on planning learning Riding deals with the stating of objectives and the evaluation of 
learning. This topic does not seem to be dealt with in sufficient detail to be useful. I am not at all sure that 
this book will be much use to those teachers who know little about experimental method or the work on 
memory. Though several chapters start off with descriptions of classroom events, the author soon gets into 
the laboratory and seems to lose sight of the children. Mr Riding refers often to Ausubel’s Educational 
Psychology (1968), and it seemed to me, until I examined it myself, that it might be more profitable to read 
Ausubel. 

The most disappointing book in the set is Gerald Cortis’ The Social Context of Teaching. The introduction 
begins promisingly, with the question ‘what actually happens in teaching?’ but the author then goes on to 
discuss the school class as an institutionalized social group, after diversions about dyads, triads, group 
cohesiveness and the various ways of measuring interactions between members of groups. His chapters on 
person perception and classroom communication deal with the stimulus information provided by children 
(physical appearance, expressive and other behaviour, non-verbal communication and speech itself), but all 
so abstractly that it is difficult to relate his account to classroom behaviour. He discusses one piece of 
research into the accuracy of teachers’ perceptions of children’s satisfaction with school, and admits that 
teachers show very little accuracy in making this assessment. He makes no suggestion that a similar degree 
of inaccuracy may be found in teachers’ assessments of other characteristics of the children. He gives little 
space to discussing teachers’ attitudes to children, or their expectations of them. He does not discuss the 
motives which make people become teachers, except indirectly when dealing with schools as organizations. 
Here he seems to be suggesting that a basic motive for choosing to work ın such hierarchical organizations 
may be ‘the need to avoid pain’, and ‘the need to self-actualize’ (attributed to Maslow as modified by 

Herzberg). He joins the de-schoolers in emphasizing the effects of the element of compulsion on school 
attendance, seeing this as an unrecognized source of conflict between teachers and pupils, and supporting 
this by a long quotation from Waller, first published in 1932. The generation gap between Cortis and the 
reviewer perhaps accounts for the latter’s resistance to this notion. 

The liveliest and most interesting of the five books is Learning and Behaviour Difficulties in School by 
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Leach & Raybould. Its aim is more direct than that of the other books. The authors want to help teachers to 
deal effectively with ‘problematic’ (sic) children, but they first discuss fully teachers’ attitudes to children 
and their expectations of them. They suggest that ‘low expectations’ may act as a self-fulfilling prophecy. 

A teacher faced with a problem of behaviour or of low attainments, they say, might begin by examining 
his own attitudes to the children, using a repertory grid technique to find out what these are, and what 
assessments have been made of individual children. They make it clear that this could be quite a painful 
process for a teacher who has perhaps not been aware of prejudices. After that, in dealing with a child 
who is seen as a problem, the behaviour should be observed systematically, to find out how often and in 
what situations it occurs. This 1s a preliminary objective assessment, to make sure that the disruptive 
behaviour, for example, does occur as often as the teacher ‘feels’ it does, and really needs treatment. If ıt 
does, then the steps in a behaviour modification programme are described, establishing the base-line, 
working out the programme to be used, the rewards appropriate for that child, the behaviours to be 
rewarded, and recording systematically the changes in behaviour that occur, and the evaluation of the 
results. 

They make it quite clear that it is not easy to devise and to carry out a programme of behaviour 
modification in the classroom but they also manage to make it seem well worth the effort involved. One of 
the points they do not mention is the phenomenon of ‘the vanishing problem’ This was observed by one of 
the reviewer's research students, while running a ‘workshop’ on behaviour modification with teachers. Some 
of these teachers found that when they made systematic observations of disruptive behaviours (e.g. shouting 
out, etc.) of a ‘target’ child, these behaviours occurred far less frequently than their subjective impressions 
had led them to believe. The reasons for this ‘vanishing’ are not at all clear. I was inclined to attribute it to a 
change tn the teacher’s attitude to and reaction to the behaviour, though on the whole the researcher did not 
agree with this explanation. However, it is likely that he will do further research on this interesting problem 
(Harrop, 1977). 

The rest of Leach & Raybould’s book deals with the ways in which behaviour modification programmes 
can be used for remedial teaching and for behaviour difficulties, and how other children in the class, and 
parents, can be involved in the programme for one child. The last chapter deals with prevention of learning 
and behaviour difficulties, beginning with what is known about the incidence of maladaptive behaviour or 
reading difficulties and is particularly useful on the factors within the school which may ‘place children at a 
disadvantage’. These can be summarized as coming from teachers’ attitudes, their feelings of helplessness 
about dealing with some of the problems of school failure or behaviour, concentrating on control and 
discipline instead of motivating children to learn by reward and praise, a reluctance to refer problems to the 
head teacher or to outside agencies; they emphasize the importance of early identification of difficulties 

This is the best of the five books in the set, mainly because it should make teachers think about their 
attitudes, their presuppositions about children, and about the way they deal with them. Of the other four, 
Child Development, and Personality and Education may provide useful background information; School 
Learning: Mechanisms and Processes is useful in parts, on different kinds of learning and ways of tackling 
problems, but perhaps rather difficult to relate to children in classrooms. The Social Context of Teaching I 
have already called a disappointing book, and by contrast with Leach & Raybould’s which is firmly located 
in the classroom, it seems even more so, in retrospective summary. 


NANCY CRAWFORD 


BERNSTEIN, B. (1971). A critique of the concept of Cognitive View. New York: Holt, Rinehart & 
compensatory education. In Class, Codes and Winston. 
Control, vol. 1. London: Routledge & Kegan Harrop, L. A. (1977). The methodology and 
Paul. applications of contingency management in 


AUSUBEL, D. P. (1968). Educational Psychology. A schools. PhD thesis, University of Liverpool 


Br. J Psychol. (1978), 69, 139-154 Printed in Great Britain 139 


Book reviews 


Talking to Children: Language Input and Acquisition. Edited by Catherine E. Snow & Charles A. Ferguson. 
Cambridge: Cambridge University Press. 1977. Pp. x+369. £8.00. 


One of the arguments put forward to disprove the Skinnerian theory of language acquisition by children, and 
to prove its innate basis, was that normal adult speech is so rapid, complex and irregular as to make it 
impossible for children to learn from it. However, it has now been demonstrated that they do not have to 
depend on ordinary adult conversation, but hear from their mothers, and indeed from other adults and older 
children, a greatly modified form of language commonly known as ‘baby talk’ — the talk of adults to young 
children, not of the children themselves. The authors who have contributed to this book present various 
aspects of its nature and effects. 

Studies such as those of Garnica and Ferguson indicate that baby talk is not characterized solely, or even 
mainly, by its use of special words like ‘dada’ and ‘bow-wow’ which we often associate with it. In the first 
place it is phonologically distinct. It is slower than adult speech, its pitch is higher, rising throughout the 
sentence, and its intonation pattern is exaggerated. Word structures are themselves modified, for instance by 
reduction and deletion of consonants. Sentences are brief and grammatically simple, and are often repeated. 
Thus baby talk appears to be effectively adapted to the limited attention and processing capacity of young 
children, becoming increasingly like adult speech as they grow older. Semantic contents are related closely 
to existing situations and to children’s comprehension of these, often to their remarks about them. 

There is no evidence as to the generality of these characteristics, though it is claimed that baby talk is 
found in languages other than English. Indeed, examples are given from Latvian, by ROke-Dravina; from 
Berber, by Bynon; and from an African language spoken in Kenya, by Harkness. But these seem to relate 
mainly to modifications of actual words. 

However, it appears probable that mothers simplify and adapt their language in order to produce what 
they suppose to be the best method of communicating with their children so that they can understand what is 
said to them and do what they are told. Thus, as Gleason points out, baby talk is constantly monitored by 
feedback from the children’s reactions, positively by their signs of understanding and compliance, negatively 
by their ignoring what is said to them. Several authors maintain that mothers are not much concerned to 
speak in such a way as to enable their children to improve their language. But surely mothers must vary in 
this respect, for instance in the extent to which they correct and expand their children’s utterances; though it 
is possible that the children pay little attention to this. 

Van der Geest argues, however, that the child’s linguistic development is mainly a function of his own 
developing cognitive abilities, affected to only a minor extent by the syntactic forms of the mother’s baby 
talk. Clearly if the latter played any important part in linguistic development, there should be some 
correlation between variations in its characteristics and those of baby talk. Newport, Gleitman & Gleitman, 
and Cross, did in fact set out to investigate this possibility. They present interesting and well-designed 
studies in which conversations between mothers and their children were recorded and then analysed for 
structure and content. Cross also assessed the children’s understanding of what their mothers said to them. 
In neither study was there any close correlation between the general syntactic characteristics of the mothers’ 
speech and those of the children’s, except in the case of certain specific surface structures such as the use of 
auxiliary verbs. Though the mothers did adjust the semantic content of their speech and the length of their 
utterances to their children’s understanding, there was again no ‘fine tuning’ of syntactic complexity to the 
child’s linguistic level. Thus it would seem that m the main the children’s speech and comprehension were 
determined by their existing cognitive capacities, and that baby talk was related to these, though not very 
closely. 

However, there were variations between different mothers in the extent to which they adjusted syntactic 
complexity, and it seems possible that these might have been greater had the mothers differed in social class 
instead of, unfortunately, all being middle class. Such variations might have been mirrored in the children’s 
language, as suggested for instance by Bernstein in his theory of restricted and elaborated codes in older 
children. 

It is not always easy to extract the general argument from the contributions of different authors, and 
indeed the introductory chapter by Roger Brown seems to contradict these in one place. There is no index. 
Though much interesting data are presented, there seem to be no general conclusions as to the factors 
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involved in syntactic development in children, nor as to the origins of baby talk. Perhaps, though the authors 
do not suggest this, it 1s a kind of traditional folk-lore, handed on from one generation to another, though 
modified to some extent by current fashion. 

M. D. VERNON 


Toward a Psychology of Reading. Edited by Arthur S. Reber & Don L. Scarborough. Hillsdale; N.J.: 
Lawrence Erlbaum. 1977. Pp. xii+337. £13.50. 


This book consists of eight chapters on different aspects of reading, submutted by different contributors at 
two conferences. As is usual with such material, separate chapters are not coherently related to each other; 
and the editors have not supplied any final linking discussion. Some topics are treated at excessive length; 
others of equal importance are omitted altogether. Though there are useful citations of experimental data, 
some from sources not easily available in this country, findings and conclusions do not appear to go much 
beyond those presented by Gibson & Levin in their Psychology of Reading. 

However, there is an interesting and perhaps significant swing away from emphasis on the visual processes 
involved in learning to read, to the auditory—linguistic processes. In a chapter on diagnostic testing, Calfee, 
after devoting considerable attention to visual tests, concludes that visual discrimination of letters usually 
causes children little difficulty. But Gleitman & Rosin, in two chapters which make up nearly half the book, 
expatiate on the obstacle created by the difficulty of analysing word sounds into their constituent phonemes. 
This they associate with the evolution of writing, in which alphabetic scripts, the most abstract, appear last; 
not perhaps a very relevant argument. Much more important 1s the extensive evidence as to the difficulty of 
identifying isolated phonemes, and the impossibility of enunciating many of them. Liberman et al. also point 
out that segmenting word sounds into phonemes is harder for young children than is segmentation into 
syllables. Moreover, short-term memory for the phonemic structures of words is often inadequate in poor 
readers. 

Both Gleitman & Rozin and Williams therefore advocate that in learning to read, attempts at analysis of 
word sounds into the highly abstract single phonemes should be preceded by learning to recogmze syllables, 
which are the smallest unitary speech percepts. However, analysis into phonemes becomes necessary later, 
and this may stil! cause difficulty to some children, and even produce illiteracy; though Williams considers 
that it may be alleviated by learning only a few phonemes at a time, and Calfee advocates special practice. 
But these schemes have not been validated; and they do not touch on the possibly greater difficulties of 
subsequently learning irregular and variable grapheme-phoneme correspondences. 

However, with the other main topic considered, the rapid reading of fluent readers, we return to the visual 
processes. Rayner & McConkie, using the method of recording eye movements, claim that the visual span at 
a Single fixation is limited to 14-18 letters, that is to say, two or three words. But this seems to have little 
relevance to normal fluent reading in which, as Gleitman & Rozin point out, sampling and chunking take 
place under the influence of the syntactic and semantic information of the text. Brooks attempts to 
demonstrate the significance of the visual pattern in experiments employing artificial alphabets; though the 
relevance of these to the extensively practised normal alphabet seems doubtful. It is true of course that 
reading may be impeded by unusual features in the typeface, as this reviewer demonstrated many years ago. 
Presumably these stimulate excessive attention to the visual display, and hence distract from the more 
important processes of sampling, etc., in reading for meaning: It is interesting to learn from Kintsch that the 
rate of reading and the amount of the content recalled is related to the number of ‘propositions’ presented in 
the text and their degree of organization, rather than to the number of words. Apparently speed of thinking 
becomes the major determinant of speed of reading, as Gleitman & Rozin maintain. Unfortunately there is 
no discussion as to how children acquire the essential processes of fluent reading. 

It is therefore justifiable to entitle this book Toward a Psychology of Reading; but there is still a very long 
way to go. 

M. D VERNON 


The Development of Cognitive Processes. Edited by Vernon Hamilton & M. D. Vernon. London: Academic 
Press. 1976. £21.00. 


It would appear from this book that the revolution partly instigated by Nesser some ten years ago has now 
nearly swept all before it. Over the last few years many of the concepts concerning information processing 
that bave gained currency in studies of adults have been applied in developmental research, with 
consequences documented in great detail in this edited book While the ideas contained in it undoubtedly 
reflect the Zeitgeist, it would have been interesting if some of the authors had addressed themselves to 
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possible deficiencies in the entire information-processing approach. Inter alia, there are certain problems 
associated with maintaining the Aristotelian tradition of conceptualizing the mind as sensory. For example, 
within such a framework it is difficult to explain how it is that memory traces are able to initiate and sustain 
complex motor patterns of behaviour. 

The book is divided into four main sections, the first of which deals with information processing in adults 
The second, and main, section contains eight out of the 16 contributions and addresses itself to the 
development of cognitive processes in children. The third section is concerned with relationships between 
the subject matter of the second section and intelligence and motivation, and the fourth section deals with 
various impairments of normal processes of cognition. While the first section, comprising chapters by 
Coltheart and by Seymour, is presumably meant to set the scene for what is to follow, there is some doubt 
as to whether this goal is achieved. For example, while Coltheart discusses iconic storage at some length, 
none of the eight subsequent contributions dealing with developmental studies of information processing 
refers to iconic storage at all. 

There appear to be other strange omissions. For example, workers within the field of human memory 
would probably expect a book like this to consider at some length theoretical approaches such as the 
Craik—Lockhart levels of analysis formulation and Tulving’s encoding specificity principle. In fact, there 1s 
very little about Craik & Lockhart, Tulving is not mentioned at all, and the index does not contain the entry 
‘retrieval’. If it is in fact that very little development work has concerned itself with these theoretical 
notions, then there are clearly important applications of information-processing ideas to developmental 
research still to be made. 

As mentioned previously, the heart of the book is really contained in the analyses of children’s cognitive 
processes contained in the second section. Of particular interest in this context are the contributions of 
Mackworth, Cromer, Farnham-Diggory, and Reese & Porges. Mackworth proposes a useful three-stage 
developmental theory of attention, with the child progressing from an initial no habituation stage, through a 
stage where novelty is sought and habituation begins to occur, to a stage of greater selection and 
discrimmation where ambiguous stimuli are preferentially selected. Cromer, in line with the growing 
emphasis on the child as an active organism, discusses possible strategies employed by children as they 
move towards linguistic competence. Famham-Diggory discusses various approaches to the development of 
human logical skill, with primary emphasis on Piaget. He shows clearly that some current theoretical 
approaches incorporate Piagetian ideas within a more quantitative and analytical framework. Reese & Porges 
demonstrate that there are various learning phenomena stemming from earlier approaches to children’s 
learning which have still not been satisfactorily explained within an information-processing framework. 

Overall, the book appears to fill an important need, in that it assembles in one place a variegated collection 
of theories that fall within the amorphous rubric of information-processing theories. In spite of some 
inadequacies with the information-processing approach, it is probable that an ever increasing number of 
researchers will find inspiration from it for at least the next few years. The general standard of the 
contributions is uniformly high, and the book demonstrates unequivocally the fast rate of progress in the 
field. One has the impression that the final section of each chapter was intended to be a succinct summary of 
the main points made in the chapter. However, the length of these summaries ranges enormously, from eight 
lines (clearly too short) to five pages (too long?). Finally, while having no definitive evidence about the 
overall accuracy of the referencing, I did just happen to notice that the reference to one of my articles was 
incorrect. 

MICHAEL W. EYSENCK 


Child Psychiatry: Modern Approaches. Edited by M. Rutter & L. Hersov. Oxford: Blackwell Scientific. 1977. 
Pp. xiv+ 1024. £21.50 


The stated aim of the editors is to provide an accurate and comprehensive account of the current state of 
knowledge in child psychiatry, and to integrate research findings with the understanding which comes from 
clinical experience and practice. A wide variety of theoretical approaches are represented, but the majority of 
the contributors are working at or have had close connections with the Bethlem Royal and Maudsley 
Hospitals or with the Institute of Psychiatry. The result is undoubtedly a volume which is valuable at a 
variety of levels; it presents not only a good deal of practical material and advice, but also cntically 
evaluates existing evidence and draws attention to recent developments and areas of growth. 

Rutter's discussion of classification 1s a particularly important contribution. With the increasing recognition 
of the need for a multidisciplinary approach, communication becomes vital, and only if there is uniformity 
in the usage of descriptive and diagnostic terms can meaning be attached to clinical reports, research findings 
and hospital statistics. At the same time there is a growing public awareness of the dangers of attaching 
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labels, and the author emphasizes that it is erroneous to equate classification of disorders with the labelling 
of people and sees doubts as to the value of any diagnosis or classification as a counsel of despair, which can 
greatly impede the therapeutic process. Clearly, oversimplification has its own deficiencies and he therefore 
presents a multiaxial framework, which when tested in a study sponsored by the World Health Organization, 
was found to make classification more uniform, was easy to apply and corresponded meaningfully to the 
usual clinical approach of the psychiatrists participating. His conclusions on classification form the basis for 
the organization of the material on clinical syndromes which comprises the largest single section. 

It is impossible to do justice in a short review to the 20 chapters on specific syndromes, written as they are 
by authors selected for their ability to speak with authority and knowledge. However, it is possible to say 
that they all combine research data with a consideration of the practical problems surrounding intervention, 
and that the information is presented with a very considerable degree of clarity — a feature not always 
present in a field in which there are as yet no hard and fast answers. 

The discussion on classification already referred to formed part of a section on clinical assessment; this 
also includes coverage of diagnostic appraisal and psychological testing, and it is encouraging to be able to 
identify themes common to both of them. Firstly, that clinical practice is distinguished by its focus on the 
individual, and secondly, the inadequacy of hopefully collecting a mass of data and trusting that some 
pattern will emerge when it is finally put together. In the chapter by Cox & Rutter on diagnostic appraisal 
and interviewing, a problem-solving approach to history taking and observing is advocated. It is proposed 
that the clinician must be formulating hypotheses to be tested from the first moment he meets the family. 
While it is important to identify those features which the disorder has ın common with other similar 
conditions, a further process 1s required to bring out the qualities which are different and distinctive about 
the individual child and his family. Similarly, Berger emphasizes that whatever procedures are being used, a 
systematic approach to the investigation of individual cases should be adopted in which psychological tests 
should essentially be thought of as a means of hypothesis testing. Tests are only of value in as much as 
they form part of a wider process of assessment which in turn is directly linked to treatment. 

A brief but useful section on developmental theories is included which draws attention to the fact that 
controversies between theorists are related not only to the ways in which behaviour and its disorders are 
understood and treated, but also that the models dictate the kind of research which is undertaken. Learning 
and psychoanalytic theories and the work of Piaget are covered and all three chapters can be considered to 
be essential reading. Dare rightly reminds us that while there is no doubt a tendency for psychoanalysts to 
be behindhand with their knowledge of techniques and findings in experimental child psychology, child 
psychologists in turn are very often out of date with what they take to be psychoanalytic approaches to 
development. 

The two other topics which go to make up the five subdivisions of Child Psychiatry are ‘Influences on 
development’ and ‘Approaches to treatment’. The former includes a helpful discussion on the effects of 
brain injury by Shaffer, in which among other things, he endeavours to examine the evidence for and against 
claims that a wide spectrum of abnormal behaviour can be attributed to minimal brain damage or cerebral 
dysfunction. Having established that as a group, brain-injured children are more likely to be psychiatrically 
disturbed, he goes on to look at the possible mechanisms by which such disturbance may be brought about. 
His treatment of this aspect of the problem makes very refreshing reading - even when these children are 
not seen as having problems which are fixed and untreatable, as he feels may sometimes be the case, their 
problems are rarely tackled on as broad a basis as his evidence suggests may be appropriate. 

‘Approaches to treatment’ is something of a mixed bag, grouping together drug treatment, behavioural 
approaches, dynamic treatments, treatment of delinquents, in-patient units and day hospitals and finally 
psychiatric social work. The last deserves special mention as it reflects the major changes which have taken 
place in the theory and practice of psychiatric social workers and the wide range of techniques which are 
now available to them. 

This volume will be of greatest value to those who recognize the need to be continually reviewing their 
ideas and practices if the discipline as a whole is to develop and if an improved service is to be provided. 
Those of us who have learned to cope with the increase in the price of books by scurrying past display 
shelves with averted gaze, might do well to make an exception in this case. 

JEAN ROBERTSON 


Transracial Adoption. By Rita James Simon & Howard Altstein Chichester: Wiley. 1977. Pp. x+197. £11.00. 


This book contains a history of transracial adoption in the United States, a summary of the handful of 
studies of such adoptions, and a report of the authors’ own survey of 200 white families who adopted 
non-white children. Transracial adoptions were rare in the United States until the 50s, when a movement to 
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adopt Asian, mainly Korean children began. Until the 1975 ‘airlift’ of Vietnamese children, Koreans 
remained the largest immigrant group admitted to the United States for the purpose of adoption. During the 
60s black American children began to be placed in white adoptive homes: this trend reached a peak in 1971, 
and thereafter rapidly declined, until at the present time such adoptions have almost ceased in the USA. At 
no time were large numbers involved: probably 15000 children have been transracially adopted in the United 
States since 1961, the majority of whom were black Americans, with a few thousand Asians and a few 
hundred American Indians. 

The authors argue convincingly that the rise and fall of transracial adoptions was neither the result of 
deliberate planning, nor a response to the success or failure of the adoptions, but rather a reflection of social 
forces, particularly those defining white-non-white relationships. The impact of the Korean and Vietnamese 
wars, and of the civil rights movement led some white families to wish to adopt transracially for 
humanitarian reasons. But the peak of this movement coincided with the peak of the black struggle for 
identity and independence. Black leaders argued that at best transracial adoptions smack of the paternalism 
that has defined white-black relations for so long, at worst they constitute a form of cultural genocide. Third 
World leaders regard the export of their children as a new form of imperialism. 

Quite aside from political considerations probably most people, white or black, are suspicious of 
transracial adoptions on the grounds that the children will be ‘white on the inside and black on the outside’, 
and perceived by both black and white as pariahs. In fact, however, as this book shows, most such children 
are secure and happy in their adoptive homes, and most of the adoptive parents are happy and satisfied with 
them, The authors: own study of 204 white families who adopted non-white children confirms these findings, 
and suggests that the children tend to have a more positive self-image than black children reared in black 
families. Thus, whilst American black children asked to select the doll they like the best, that looks the 
nicest, and one they would like to be, etc., tend to pick a white rather than a black doll, these black adopted 
children showed no preference for the white dolls. Further, their choice of ‘a doll that looks like you’ 
showed that they perceived themselves as black as accurately as white children perceive themselves as 
white. Young black children in black families, on the other hand, tend incorrectly to identify their own 
colour. 

The authors conclude that if the children continue to be ‘emotionally whole, well-adjusted, and able to 
move easily within and between black and white communities, society’s failure to maintain and support the 
programme will be remembered with deep regret’. This book is particularly relevant to British policy, since 
transracial adoptions, although not frequent, are probably now increasing rather than decreasing. 

BARBARA TIZARD . F 
Crime and Personality, new rev. ed. By H. J. Eysenck. London: Routledge & Kegan Paul. 1977. Pp, 222. 
£4.50. ' : 


The first edition of Crime and Personality was published in 1964. In this newly revised third edition Professor 
Eysenck shows how several aspects of his theory and research have developed in the intervening 13 years. 
In particular, in their own research on the relationship between personality and criminal behaviour, the 
Eysencks have discovered that the dimension of psychoticism (P) now seems more significant than the 
earlier favoured personality dimensions of neuroticism (N) and extraversion (E); the research findings on 
twins and criminality are expanded by reference to studies of adoption, briefly outlined in the chapter 
entitled ‘The mark of Cain’; and perhaps rather less predictably, in his discussion of ‘Punishment or cure?’, 
Professor Eysenck has largely disowned his former allegiance to aversion therapy in the treatment of 
offenders, in favour of ‘token economy’ methods, helped by the notoriety of the film of The Clockwork 
Orange and the timely assistance of top-security prisoner John McVicar's autobiographical comments on the 
need for ‘thought reform’ for the likes of himself! 

Nevertheless, despite these various signs of change in Eysenck’s theory of crime and punishment, the 
overwhelming verdict on this book in terms of its contribution to contemporary criminology must be that in 
tone and content it is even more of an anachronism in 1977 than it was when it first appeared in 1964, 
Paradoxically, much of the explicit or implicit evidence for such a verdict can be found in the new 
Introduction, which was presumably intended by the author to have an exactly opposite effect. The language 
and sentiments are truly reminiscent of the ‘founding fathers’ of positivist criminology in 19th century 
Europe, with their naive optimism in the ability of ‘science’ to conquer the problem of crime once and for 
all; ‘If we are serious in trying to reduce this colossal burden [of the cost of crime to society] then we must 
first of all bave a good theory on which to base our measures... Perhaps the time has come to put our faith 
in the scientific method’ (Eysenck, 1977,’p. 14). At one and the same time Eysenck both exaggerates the 
extent to which sociological theories of crime have been ‘prominent and widely accepted’ in the past two 
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decades and yet selects for discrediting the most simplistic versions of such theories, e.g. poverty, bad 
housing, which have not been seriously held by criminologists in the form stated for 20 or 30 years He 
distinguishes between causal analysis and mere correlation, claming that only psychologists of crime are able 
to provide causal theories, whereas sociologists can only provide us with correlations that prove nothing; 
such a distinction between the disciplines in terms of stereotyped methodologies 1s clearly unacceptable, and 
denies the variety and potentialities in both sociology and psychology — especially as Eysenck himself often 
bases his own conclusions on similar correlational methods as those used by the sociologists he pillories. 

The Introduction seems deliberately to ignore the major development of the last decade within 
criminology, namely the various types of radical sociological approaches which have emerged, including 
specific critiques of Eysenck’s position such as that in The New Criminology by Taylor, Walton & Young; 
but, just as radical critics like these make token gestures towards the need for a ‘social theory of deviance’ 
to take account of psychological factors, so here Eysenck maintains that ‘in this book we have not 
attempted to deny the importance of environmental factors.’ In fact, there is little real attempt in the rest of 
the work to integrate sociological or environmental factors into what is a genetically based psychological 
theory. 

Eysenck remains an unrepentant ‘determinist’, so that he 1s likely to be viewed by many as still the best 
representative of ‘neo-positivism'; he states unequivocally that ‘we would regard behaviour from a 
completely deterministic point of view. . .the individual’s behaviour is determined completely by his heredity 
and by the environmental influences which have been brought to bear upon him’ (p. 195). Perhaps the other 
obvious example of the way in which Eysenck is so much at odds with or apparently out of touch with 
contemporary criminology is how he handles the concept and definition of crime itself. Basically, of course, 
crime tends to be interpreted by psychologists in terms of pathological or deviant behaviour; thus, in this 
book, so often the terms criminality and delinquency are found linked to that of ‘psychopathy’. There is a 
passing recognition by the author of alternative definitions of crime, whether of a legal, sociological or 
political kind; and, at a more practical level, of the fact that those labelled and processed as criminals by any 
particular society are not a homogeneous group. This causes Eysenck to limit the applicability of his 
theories to certain types of criminals rather than others, but by doing this he is in danger of arguing in a 
circular fashion, e.g. extraversion explains the crime of extraverted criminals but not that of the introverts 
(pp. 59ff). The basic failure to grapple with and resolve the question of what is being explained remains a 
fundamental flaw in the whole enterprise, whatever else it might seem to achieve purely in its own terms. 

In conclusion, having dismissed several sociological ‘straw-men’ theories as being irreconcilable with the 
general evidence of rising crime rates in the more affluent industrialized societies, Eysenck suggests that the 
answer lies in the increased permissiveness in socialization, drawing particular attention to the way he claims 
that there has been a grand ‘abdication of responsibility’ by parents in America. Not only does this 
conclusion seem a rather poor advert for the much heralded achievements to be looked for from the 
“scientific method’ but, even if true, would seem to raise as many sociological questions and depend on 
sociological evidence, as it would seem to justify the psychological cause to which the book is dedicated. 
KEITH BOTTOMLEY 


Human Diversity: Its Causes and Social Significance. Edited by B. D. Davis & P. Flaherty. Cambndge, Mass.: 
Ballinger. 1976 Pp. xıv +248. $13.50. 


It must be acknowledged that some scientific issues impinge on public policy: perhaps not as often, nor with 
the impact that some scientists might wish, but sufficiently offen to maintain a political fervour in otherwise 
staid scientific debates. A recent example is the attempt in the United States to control research on genetic 
engineering because of possible long-term dangers to humanity. Of all fields, human behaviour genetics is 
likely to be susceptible to disagreements from within and suspicion from without, combining as it does the 
politically controversial areas of general differences and human abilities. 

Difficulties he at the interface between science and politics in the hesitation and distrust felt on both sides. 
The good scientist, imbued with caution and objectivity, is naturally hesitant to espouse a cause too strongly, 
and the politician is uncertain how far to rely on scientific arguments. When this situation occurs in 
connection with an issue which is capable of promoting prejudice and propaganda, it creates the opening for’ 
demagoguery to masquerade as science and for scientists to lose their perspective. Human Diversity is about 
such an issue, the relationship between race and 1Q. The book reports the proceedings of a series of 
discussions which took place in 1973-74 between leading geneticists, psychologists and social scientists on 
the scientific background to the race-IQ issue. E 

I must admit that I enjoyed readıng the book, despite its style of presentation which was at first irritating, 
but became less so, even addictive after a while ~ rather like play-reading. To change the analogy, I felt at 
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times like the butler at a royal banquet, forced to listen in on an interesting discussion, but unable to stop 
the important gentlemen and ask the pertinent question. One felt that they covered the ground fairly well 
however, without exceeding the comprehension of the average politician or social science student. Three 
seminars, ‘Evolution’, ‘Genetics’ and ‘Behaviour genetics’ provided the background and led into the fourth 
‘Intelligence and group differences’. The latter seminar included discussions by Jensen and Cavalli-Sforza. 
The former is an educational psychologist who has accumulated a large quantity of circumstantial evidence 
for his view that the mean IQ difference between American black and white populations is largely genetically 
based, while the latter is a geneticist who disagrees with Jensen, but whose argument suffered from heavy 
reliance on a weak British study. 

If a politician picks up this book he is likely to be disappointed. Despite the subtitle, social implications 
are barely mentioned and one is left in doubt even about the areas of public policy that might be affected by 
the debate, let alone given any guidance. Of course it is difficult if participants disagree, but surely some 
conclusions are warranted. Pertinent comments in the earlier discussions were made by Mayr who said at 
one point that ‘in dealing with the evolution and meaning of human diversity. . .it would be more meaningful 
to concentrate on individuals rather than on subgroups in the society’. In the later seminars some of the 
more sensible comments are those of Eisenberg, including the point that ‘one cannot conclude that because 
an attribute is psychological or environmental it is readily manipulable, especially in this case where the 
details of the effective environment are so poorly understood’. On the other hand, he said later, it is ‘not 
valid. . .to conclude that the reason that blacks or those in lower social classes do poorly is so completely 
determined by hereditary factors that investment in change or alteration is not worth undertaking’ The 
conclusion must be that social policy-making in education, civil rights legislation or distribution of public 
funds should not be influenced by the IQ debate. There is .no outcome that can justify the continuation of 
prejudice, poverty or injustice. 

PATRICK TYLER 


Who Do You Think You Are? By Oliver Gillie. London: Hart-Davis, MacGibbon. 1976. Pp. 255. £4.95. 


I approached this book in the hope that it might prove useful reading for those interested in the present state 
of behaviour genetics and in the nature/nurture issue. At first glance the book looks promising. The fly leaf 
promises a ‘cool, clear examination of the areas of genetic inheritance where the controversy rages most 
rampant: sex, crime, physical and mental health, drug and alcohol dependence and, perhaps most critically, 
intelligence’. i . 

The book opens with a rather too brief summary of basic genetics, and a useful account of the abuse of 
genetics in the Lysenko affair in the Soviet Union, and the rise of the Nazi eugenics programme. Having set 
the scene, Dr Gillie then proceeds with his main task of demonstrating the variability and flexibility of 
human behaviour and the fact that environmental factors influence our behaviour. This potential behavioural 
flexibility is, perhaps, the main theme of the book, and in tackling the issues of sex, crime, health and 
intelligence, Dr Gillie goes to great lengths to demonstrate the importance of environmental variables, as 
opposed to genetic influences. Unfortunately, it is in this aspect that the book is least satisfactory. The 
author has approached the problem by assuming that scientists interested in these areas are either 
‘hereditarians’ or ‘environmentalists’. Thus, those suggesting a partial genetic component are immediately 
categorized as ‘hereditarians’, and placed alongside Jensen, Eysenck and Shockley. It is this dichotomous, 
approach when then influences the remainder of the book. As Dr Gillie’s aim is the laudable one of 
illustrating the relevance of environmental factors, these are concentrated on, and little attention is paid to 
the genetic evidence. The author appears to assume that because environmental factors can easily be shown 
to be relevant, this implies that genetic influences are therefore irrelevant McClearn has summed up this 
somewhat strange approach, which is not confined to this book: ‘The dichotomous view that a trait must be 
due either to heredity or to environment has also permitted the development of a curious prejudice. . . The 
burden of proof seems to fall on the proponent of a hereditary role. Environmental factors are assumed to 
be relevant unless demonstrated not to be; genetic factors are assumed, a priori, to be irrelevant, and can be 
admitted only after rigorous demonstration,’ 3 a 

It is in this dichotomous approach that the book’s main weakness lies. The form of dichotomy adopted 
- precludes any serious. discussion of genotype-environment interaction, and, in fact, the possibilities of such 
interactions are rarely mentioned. As Dr Gillie states, there are those extreme hereditariahs who believe in 
predominantly genetic influences. However, there are also many scientists who argue that' complex 
interactions exist between genes and environments. These scientists rarely commit themselves to heretability 
estimates, pointing out that the coefficient of narrow sense heretability is not a constant, but indicates only 
the relative proportion of the variance that is caused by additive gene effects in a particular population, at a 
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specific moment, and measured under a particular range of environmental conditions. The nature—nurture 
controversy should be broken down to refer to what characterstics, under what circumstances, and in what 
periods of time. Advocacy of a pure environmentalism or a pure geneticism leads to a misunderstanding of 
the complexity of human behaviour. 

In conclusion, this book is rather disappointing. There is little doubt that Dr Gillie can write fluently, and 
puts the material over in an easily comprehended style. Few would disagree with his argument that 
behavioural flexibility exists and that environmental influences are important. However, in hts enthusiasm for 
this message he ignores the fact that human beings are also products of their genes and that resulting 
behaviours may be the result of a complex series of interactions. 


D. F. SEWELL 
MCCLEARN, G E. (1973). Genetic aspects of Alcoholism: Progress in Research and Treatment. 
alcoholism. In P. G. Bourne & R. Fox (eds), New York & London: Academic Press. 


The Neurophysiology of Memory. By A. R. Luria. Washington: Wiley. 1976. Pp. xvi+372. £17.50. 


This is a translation of two volumes published in Russia under the same title in 1974 and 1976. In terms of 
Western preconceptions the organization of this book ıs unusual. The first part considers a battery of 
memory tests, which appear to have been assembled shortly after 1960, and their application to different 
categories of patients. The’ tests include rote-learning of lists of ten words; memory for short lists of two to 
four random words presented once, with and without filled and unfilled delays (30 sec to 2 min); recall of 
sentences; recall of visual material (pictures and symbols), Konorski’s same/different figures test with 
variable delay between presentations; Uznakidzes’ test of retention of a tactual after-effect; recall of stories, 
all the latter also with and without delay and intervening tasks. It is not apparent that all tests were 
administered to all patients, and it is not clear that the patients discussed in the contexts of some test results 
are also discussed in the context of others. 

Data from each of these tests are considered, ın turn, for patients suffering from tumours of the pituitary 
gland, from other mid-brain tumours and lesions involving limbic system and hippocampal structures, from 
lesions of the frontal lobes and from lesions of the lateral cortex at various right and left temporal, 
temporo-parietal and parieto-occipital sites. Normal controls were also tested, but their results are not given 
in detail. Overall numbers of controls and normals are given, but results are not broken down for tasks or 
lesion sites. 

The second part of the book follows a different, and slightly more convenient, organizational scheme 
Chapters deal with patient categories, and describe the results of the same (roughly) memory tests. There are 
three cases of third ventricle damage (ch. V), two massive deep-brain tumours (ch. VI), two cases of 
aneurism of the anterior communicating artery (ch. VH), one case of a lesion involving diencephalic systems 
(ch. VIED) and two cases of massive frontal lobe damage (ch IX). 

The presentation of case material in both sections is repetitious and disorderly. Extensive use is made of 
raw unedited protocols from test sessions (about three-eighths of the book consists of these). There are 
points which the careful and imaginative reader may be glad to check, there are hints which are enticing, or 
places where disagreement with Luria’s hypotheses is possible. For the most part, however. this redundancy 
merely hinders progress through the book and masks the clarity of Luria’s conceptualizing of the important 
problems he raises. All this is not helped by a clumsy translation which neglects convenient simple English 
to ‘slavishly’ reproduce sentence length, word order, syntactic redundancy, repetitiousness of style and even 
on occasion Russian word forms (e.g. ‘summation’ for ‘summary °). There is at least one obvious 
mistranslation, inverting the sense (p. 84, para 2 line 1). The printers have also done a poor job. 

Nevertheless this book is eminently worth acquiring and studying, and perhaps the best service any 
reviewer can perform ıs to offer an easier pathway through it than the author and his associates encourage. 

If the reader begins with ch. IH (pp. 141-161) he will find a lucid account of Luria’s preoccupations and a 
summary of his conclusions. Following this, the ‘conclusions’ to ch. IV and to each of the succeeding 
chapters (pp. 233-234; p. 254; pp 296-297; pp. 309-310; pp. 340-341) fill in the details. After this Luria’s 
taxonomy of problems and of theories in his introductory chapter (pp. 1-32) ıs comprehensible. The decision 
to read the case material may then be taken (or avoided) under favourable conditions. 

If the reader does this he will discover (with considerably less effort than the present reviewer) that a 
number of crucial questions have been raised, and that many provocative data and speculations relevant to 
them have been discussed. 

A main question raised 1s whether pathological forgetting is better interpreted as the result of accelerated 
trace decay or as the result of interference, Luria’s evidence ıs that whenever brain damage leads to memory 
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defects, these are not greatly exaggerated by the passage of unfilled time between presentation and recall of 
material. However, they are exaggerated by filled delay, most particularly when interpolated material 
resembles material to be recalled. Luria makes a nice distinction between syndromes in which interpolated 
material blocks recall, in which interpolated material intrudes into material to be recalled, and in which (as 
with frontal lesions) interpolated material either prevents ‘switching back’ to earlier material, or appears to 
interfere with control processes organizing order of retrieval. 

In brief, Luria’s position seems to be that trace decay, per se, is not a useful general explanation for these 
disabilities. It is rather failures of selectivity between different trace systems which underly disturbances of 
function. 

A second general question is whether the distinction between modality-specific and global disturbances of 
memory is useful in understanding function, and whether local lesions produce one defect without the other. 
Another extremely important finding 1s that left temporal lobe lesions appear to produce modality-specific 
defects in memory as well as in comprehension and recognition. These defects can be illustrated by 
increasing task difficulty, and can be clearly distinguished from more general deficits produced by mid-brain 
or frontal damage. 

Judgements on Luria’s most general contribution — the taxonomy of style of memory deficit by lesion 
location and the relation of particular memory defects to ‘consciousness’ must be left to future generations of 
clinicians. 

Briefly, he shows that pituitary tumours which do not invade ‘the limbic region’ (sic) may cause metabolic 
changes and disturbances of the sleep cycle, but have no overall effects on cognition other than general 
(though relatively mild) memory disturbances. In contrast deep midline tumours involving the limbic system 
and hippocampus show effects related to the classical Korsakov syndrome. Here, with no apparent loss of 
gnosis or praxis, of speech or formal logical ability, there can be severe general memory loss and 
disorientation. It is interesting that Luria finds, in this form of defect though not in others, that recall! of 
semantically organized material is no better than recall of lists of isolated words or of random pictures. 

Patients who suffer from these lesions recognize that they have memory defects. In contrast patients 
suffering from fronto-medial damage fail to recognize their loss. With these massive frontal lesions memory 
loss appears to be associated with defects in control processes by means of which events are ordered for 
storage or retrieval. . 

In contrast to all these general defects are the modality- or task-specific effects of temporal or parietal 
lesions discussed above. 

This story, crudely abridged here, is endorsed by the case material presented. The Western worker will be 
disappointed by the allusiveness of style, and by opportunities lost. Why were a wider range of tasks not 
used? When cognitive functions are discussed why are standardized procedures for assessment not 
employed? Why is the important topic of rehearsal, crucial for Luria’s distinction between the effects of 
filled and unfilled delays, not considered at all? Why is documentation so excessive in some cases and so 
patchy in others? 

It is perhaps a mark of Luria’s achievement that although all these questions are pressing, they are 
insignificant in view of the importance of his insights and the clarity of the distinctions he proposes. It is 
remarkable that his thought can provoke enough excitement to carry a reader through a very carelessly 
organized book (whether the excitement will be sufficient to reconcile him to paying a swingeing price for a 
badly translated and carelessly produced text is another matter). Luria’s ideas open up useful lines of further 
research Thus it is churlish to be irritated with the clarity of a mind that incidentally also avoids obfuscation 
with methodology. To complain that Luria’s useful insights are by no means fully substantiated by the work 
he presents is also grudging. The text is an overview of some years of research carried out by Luna and his 
associates. The patients are obviously also described in a large number of journal articles to which adequate 
reference is made. Pending the availability of these articles no judgements can be made. It is enough that 
those who can most benefit have something new and fresh to think about while planning work of their own. 
PATRICK RABBITT 


The Sleep Instinct. By Ray Meddis. Henley-on-Thames: Routledge & Kegan Paul. 1977. Pp. 148. £4.50. 


In The Sleep Instinct Ray Meddis has achieved something which few psychologists do ~ to write at a 
scholarly depth of analysis, and yet to use a style and vocabulary which make this book easily accessible to 
the layman. He is arguing for a completely new conception of the way in which sleep evolved, and the 
functions it served in evolution, and now. His immobilization theory states that sleep evolved in order to 
keep animals out of trouble at night time, by withdrawing into safe places and remaining still. All the 
phenomena of the sleep drive, or sleep instinct, can be explained in these terms. Drowsiness is the chief 
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agent of the sleep-control system, working by lowering our interest in most things. We eventually fall asleep, 
however urgent the demands of the waking world may seem, but not until we have had plenty of warning, 
and time to withdraw to our sleeping place. The fact that acute sleep deprivation disrupts performance by 
increasing the number of lapses into unresponsiveness, rather than by any direct effect on speed or accuracy 
only supports this point of view. 

The most persuastve and exciting chapters come towards the beginning and end of the book, dealing with 
the origins of sleep, the drowsiness mechanism, and insomnia. Two of the middle chapters are concerned 
specifically with advocating the idea that sleep is now an evolutionary vestige, of no use to man. They are 
fluent and witty, hke the rest of the book, but contain mstances of sleight of hand which might be more 
obvious in a conventionally written scholarly work. For example he quotes William Dement, saying in 1969 
that paradoxical sleep deprivation had no effect on high-level processes such as learning and memory, 
knowing full well that all the best work, in both animals and humans, in this area was done after that, and 
there is now good evidence for a strong link. Similarly, he has three prize species — Dall’s porpoise, the 
swift, and the albatross - which have been reported not to sleep at all. They are presented as evidence that 
sleep is unnecessary. Many animals, once thought to be non-sleepers, have subsequently been found to 
sleep. As recently as 1965 there was a paper in the Journal of the Vetinary Association of America asserting 
that cattle did not sleep at all, The author had wired up three steers, in individual pens, to a portable 
encephalograph, and not one had shown any sign of sleep for three days. Any cowman could have told him 
that cattle hate to sleep alone, and that in a herd there is always one individual awake and standing, 
apparently acting as ‘sentinel’. It is thus by no means conclusive that the Dall’s porpoise cannot sleep, and 
does not sleep in the wild, when sleeplessness has only been observed in captivity, under laboratory 
conditions. So far as the swift and albatross are concerned, the main evidence seems to be that they spend 
much of their time on the wing, and do not sleep on the ground. He says, ‘It is reasonable to suppose that 
these birds do not sleep on the wing since it would serve little purpose ..’, falling into the same trap of using 
a circular argument as do the ‘common sense’ theorists. 

In talking about the effects of sleep deprivation, Dr Meddis argues that sleep might just as well be 
compared to sex as a drive, rather than hunger, since abstinence from all of them has the same general 
effect - ‘After periods of sustained deprivation of both hunger and sex drives, rebound phenomena occur. 
The starving man not only eats more, when given the chance, but also eats with much greater vigour than 
normally. The sex-starved man, likewise, engages in sexual activity in a more sustained and vigorous manner 
when opportunity finally permits.’ There is a very important difference between sleep and hunger, which is 
highlighted by this comparison, made by Marie de Manacéine in 1886, 


Direct experiment has shown that animals entirely deprived of food for twenty days, and which have then 
lost more than half their weight, may yet escape death if fed with precaution ~ that is to say, in small 
amounts often repeated. On the other hand, I have found by experimenting on ten puppies that the 
complete deprivation of sleep for four or five days (96 to 120 hours) causes irreparable lesions in the 
organism, and in spite of every care the subjects of these experiments could not be saved. Complete 
absence of sleep during this period is fatal to puppies in spite of the food taken during this time, and the 
younger the puppy the more quickly he succumbed. 


Sleeping is therefore fairly obviously linked to the survival of the individual, rather than the species, as sex 
is Of the three drives, drowsiness (producing sleep) seems most crucial in maintaining life, and 
concupiscence the least 

Despite this bee in his bonnet about sleep being a waste of time, Dr Meddis has written a very good book. 
His arguments about the evolution of paradoxical sleep as the most ancient sleep, and slow wave sleep being 
a relatively recent addition to the sleep mechanism, seem very persuasive to me. As he says, this way of 
looking at things explains a number of apparent anomalies, such as the appearance of paradoxical sleep 
before slow wave sleep in the foetus, and the loss of thermo-regulation in paradoxical sleep in mammals. The 
how and the why of sleep are, however, two distinct issues. It is a biologist’s credo that al highly 
structured physiological processes have complex functions, even if they are not obvious. Simply asserting 
that there 1s little scientific evidence for a process having a function cannot be the end of the matter, and is 
not much help. A Ernest Hartmann so eloquently said at the 1975 International Sleep Congress, held in 
Edinburgh, ‘The eye keeps flies off the lachrymal bone... but is that enough?’ Í 


JAKE EMPSON 
i 
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Hypnosis: Trance as a Coping Mechanism. By F. H Frankel. New York: Plenum Medical. 1976 
Pp. xui+ 185. $14.95 


In this book Frankel reminds us that some of the most important evidence for the efficacy of hypnosis comes 
from its clinical application, and that the historical roots of hypnosis lie in the treatment of hysteria. It is 
refreshing to read a proponent of hypnosis who is not afraid to admit that the experience of ‘trance’ or 
altered state of consciousness is a central defining feature of hypnosis, making it something more than simple 
waking suggestion or relaxation. This clanfies his position and avoids the semantic jungle that occurs when 
primary suggestibility is assumed to be synonymous with hypnosis, and relaxation is referred to as 
‘non-hypnotic hypnosis’. He rightly points out that unless the emphasis on trance is maintained it is difficult 
to defend the use of a special term like hypnosis. 

Essentially his argument ıs a revival of Janet’s (1907) proposal that there exist certain similarities between 
clinical symptoms associated with hysteria and the experience of the hypnotic trance. Thus, under hypnosis, 
patients often experience distorted perceptions and anxiety which parallel their ‘waking’ experiences when 
their symptoms are most pronounced. In support of this hypothesis he suggests that hysterics, particularly 
phobics, tend to be highly hypnotizable, and concludes that in certain cases, the hypnotic response is ‘in 
some way, causally related to the symptoms’ (p. 153). Frankel argues that because the symptoms or a 
facsimile of them can be produced in a trance, the trance can be used as a coping mechanism; the rationale 
being that rf the patient can create the symptoms in a secure environment he can gain familiarity with the 
problems, understand them, and subsequently gain control over them. Even when the symptoms are not 
causally linked to hypnotizability he argues that the trance may augment the coping mechanism of an 
individual by ‘uncovering his trance capacity and adding to his ego strengths in teaching him how to use it, 
on his own, in self-induced hypnosis exercises’ (p. 153). 

Although Frankel’s idea is an extremely interesting one I feel, unfortunately, that the evidence he provides 
lacks the scientific rigour necessary to support his conclusions in any definitive-way. The evidence is 
primarily presented as a series of case histories; the only additional empirical support comes from a study 
indicating that phobic subjects tend to be more responsive to hypnosis scale suggestions than smokers 
seeking help through hypnosis, and patients with multiple phobias are more susceptible than patients with 
single phobias. Although the case histories are well presented and make fascinating reading (they include a 
variety of symptoms from writer’s cramp to multiple personalities) it is difficult to ascertain the extent to 
which hypnosis made a unique contribution to the treatments, and if it did, what features of the hypnosis 
situation were instrumental in producing any improvements. In most of the cases hypnosis was used only as 
an adjunct to other treatments such as desensitization and psychotherapy, and Frankel notes the similarity 
between the uses of imagery and relaxation in hypnosis and their employment in imaginal desensitization. It 
therefore becomes pertinent to ask whether the effectiveness of hypnosis in such cases reflects anything 
more than the operation of the processes involved in imaginal desensitization. He counters this by saying 
that hypnosis accelerates desensitization. Whilst this may indeed be the case, some controlled trials would 
seem necessary to confirm this quite crucial point. Similarly, although he acknowledges the influence of 
variables such as compliance and demand characteristics, and therapist enthusiasm in the hypnosis situation, 
he does not attempt systematically to control these factors. Without additional data on the effects of 
relaxation, waking suggestions, placeboes and psychotherapy alone, it 1s difficult to assess whether 
experience of trance is a significant factor. 

It is quite apparent that Frankel is familiar with these problems and it is regrettable that he does not deal 
with them in more detail. In a sense he has tried to let the case histories speak for themselves when he could 
have possibly reduced the number of case studies and devoted more space to the consideration of alternative 
explanations and anomalies. For instance, it is difficult to reconcile his claims that hysterics are the most 
highly hypnotizable group amongst patient populations with the findings of Gibson and his co-workers that, 
amongst normal populations, neurotic extraverts are the least hypnotically susceptible. I feel that such 
discrepancies deserve at least some discussion. 

Nevertheless, the case histories do speak quite well for themselves, and whether or not we agree with 
Frankel's conclusions, anyone interested in hypnosis should find this a very stimulating book. Also, I feel he 
has performed a valuable service in describing a clinical role of hypnosis in a sensible and unexaggerated 
fashion. 

GRAHAM F. WAGSTAFF 
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Methodologies of Hypnosis: A Critical Appraisal of Contemporary Paradigms of Hypnosis. By D. W. Sheehan 
& C. W. Perry. New Jersey: Lawrence Erlbaum. 1976. Pp. xii+329. £13.50. 


Anyone who has some knowledge of the experimental literature on hypnosis may feel justifiably overawed by 
the complexities and contradictions surrounding investigations in this area. Sheehan & Perry have tackled 
this methodological nightmare in a systematic and scholarly fashion. Following an historical introduction, the 
book critically evaluates the approaches of Hilgard, Barber, Sarbin, Sutcliffe, Ome & London and Fuhrer. 
This particular choice of approaches gives a very comprehensive view of the kinds of.methodologies that 
have been applied to hypnosis, and makes the book an excellent source of references. Each approach is 
presented in the form of description of the theoretical position, a survey of relevant strategies of research, a 
summary evaluation, and directions for future research. To summarize, evaluate, and integrate such a 
diverse array of theoretical orientations and experimental designs in a lucid fashion 1s no mean feat, but I 
think Sheehan & Perry have successfully accomplished this. 

Although I can find little to fault in their expositions of the various approaches, I am not too happy about 
the utility of the ‘heteromethod replication’ technique they propose in ch. 8 for investigations of hypnosis 
and other areas of psychological inquiry. Heteromethod replication involves the application of a number of 
different methodologies to a single problem in order to control for a large range of possible artifacts. The 
particular example they give involves the application of a real-simulator (RS) design and a task-motivational 
(TM) design to the problem of whether hypnotic subjects have a more personal involvement with the 
hypnotist. Although each design is applied to a different task, they conclude that the overall results show 
hypnotic subjects do experience more involvement. The assumed advantage of this approach is that the 
sources of artifact idiosyncratic to each design will somehow be precluded by the controls inherent in the 
other design. For instance, they say that although the experimenter was not blind in the TM design this 
source of artifact can be ruled out because a similar overall result emerged using the RS design when a blind 
experimenter was used. Similarly, the problem of differential personality characteristics which can occur in 
the RS design was eliminated because this source of artifact was controlled in the TM design It is difficult to 
see the logic behind this conclusion; the TM and RS designs possess different methodological difficulties, 
thus, why cannot the similar results be a consequence of experimenter bias in one task and personality 
differences in the other? It is ironical that Sheehan & Perry use precisely the same argument as a criticism of 
Barber’s assumption of the logic of equivalence, ‘Equivalent effects may in fact be obtained for the two 
conditions but if the variables differ for the. . .conditions then comparability of effects may simply reflect 
similar behavioural consequences occurring for different reasons’ (p. 101). Whilst I sympathize with the 
spirit of the approach I am unclear as to how it can be applied to any advantage. 

Although, in general, the discussion gives a full and accurate representation of the current state of 
theorizing there seems to be a rather premature conclusion concerning the importance of compliance or 
‘sham’ behaviour. Although there may indeed be a tendency amongst non-state theorists to de-emphasize the 
sham behaviour aspects of hypnosis and to stress the importance of imaginative processes, the present 
situation hardly seems to warrant the conclusion drawn by Sheehan & Perry that ‘no present day theory 
which conceives of hypnotic behaviour as in some way fraudulent or sham behaviour can expect to be taken 
seriously’ (p. 2), and ‘the question of sham behaviour no longer constitutes an issue in the hypnosis 
literature’ (p. 271). For instance, the findings that subjective test scores frequently tend to be lower than 
objective scores and that hypnotic subjects acting as their own controls may deliberately distort their 
performance to produce a spurious difference between treatments, clearly point to the influence of 
behavioural compliance. Sarbin & Coe (1972) actually describe a case where faked hypnotic behaviour occurs 
with little organismic involvement. I feel there is still a considerable amount of controversy regarding this 
issue and more research 1s needed to establish the extent of its influence. 

Nevertheless, there is far more to commend in the book than to criticize. One has to admire the 
thoroughness of the whole exercise and appreciate the many hours the authors must have spent sifting and 
analysing the voluminous literature. I found the analysis of the models of Orne and Sarbin particularly 
adroit Although the overall impression they give 1s one of more sympathy with the ‘trance’ or ‘state’ 
interpretations of hypnosis all the models receive a useful share of criticism which should please readers of 
all persuasions. 

The price is rather high and anyone looking for sensational reports of hypnotic wart induction, surgery and 
reincarnation will find little of interest in this book. They will also probably be put off by the rather staid and 
technical style in which tt is written. However, as a text on methodologies in hypnosis I imagine it will 
probably become a classic in the field. 

GRAHAM F. WAGSTAFF 
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Machine Intelligence 8. Edited by E. W. Elcock & D. Michie. Chichester: Ellis Horwood. 1977. Pp. 630. 
£24.00. 


The subtitle of this collection is ‘Machine Representations of Knowledge’ and that topic is considered in ten 
sections. The sections whose titles most clearly promise their being related to psychology are: ‘Problem 
solving and deduction’, ‘Inductive acquisition of knowledge’, ‘Perceptual knowledge’, ‘World knowledge for 
language understanding’, and ‘Dialogue transfer of knowledge to humans’. The remaining sections are 
collectively about the same length and certainly have their interests, too. Indeed the range of machine 
intelligence work today emerges when trying to spot where in the book one may best find the psychology. 
Sections with titles like ‘Programming tools for knowledge representation’ sound unlikely ground in which to 
find psychological gems. Papers like van Emden’s ‘Programming with resolution logic’ confirm that surmise. 
However in his beautifully structured exposition of using first-order predicate logic for high-level programs 
one gains so much that to be without some definitely psychological flashes really matters little. In the same 
section is a paper by Davis & King, ‘An overview of production systems’, that certainly does acknowledge 
work on short-term memory and usefully reviews related computer studies. In papers like Alan Mackworth’s 
“How to see a simple world. . .’ are discussions that cognitive theorists may find congenial or not, but which 
they can hardly ignore. The other papers of the section on ‘Perceptual knowledge’ pursue the details of how 
to process pictures of polyhedra. Despite the section in which they appropriately appear these papers are 
less obviously germane to psychology than many others in the book. 

The section entitled ‘Dialogue transfer of knowledge to humans’ is so psychological that the machine 
intelligence themes ‘pale by comparison. There are three papers and Robert Davies is an author of two of 
them, once alone and once with three collaborators. In the summary of the second paper the authors admit 
that artificial intelligence has been in effect an incidental. They sought to use the flexibility of the PLATO 
educational computer system, a Genevan approach to human learning, and the actions of children and other 
learners to demonstrate that machine instruction is not linear, does not depend on S-R links, and is far from 
passive. The papers by Charniak, on ‘Inference and knowledge in language comprehension’, and by Schank, 
on ‘Representation ‘and understanding of text’, make up the final section of the book. To this psychologist 
with an interest in language the best were kept until the last. It would be easy to fit into many current views 
of cognition Schank’s general idea of ‘forgetting heuristics’. His term makes its points forcefully because of 
his vivid examples of what text needs to be open to comprehension. Cherniak has a similar happy facility. 
This final section contrasts with one on ‘Inductive acquisition of knowledge’. In its two papers on 
chess-playing programs, by Negri and Michalski, are some impressive results of assessments of end game 
positions by computer. The contrast lies in the means of getting those results and the awareness of problems’ 
settings that Schank and Cherniak offer. My intuition is that few of us have such well-based, albeit 
intuitive, knowledge of chess as we have of language. A surprising hint comes from this: language may be 
more tractable than chess in machine intelligence studies which, of course, seek to make knowledge explicit! 

The 29 authors of this collection are, to a person, from departments whose title at most suggests an 
overlap with psychology and often does not even do that. In their 26 actual papers are a good scattering and 
the occasional abundance of offerings of interest to psychologists. This appraisal leads to the question of 
whether to recommend spending £24.00 on their book. It is pleasingly printed on good quality paper, the 
binding is pleasant, and such helpful items as the Contents and the Indexes are well compiled. The 
references appear after individual contributions, this can be mildly irritating because where these end is not 
as obvious as where the book ends. To find the notes on papers immediately after them is more appealing. 
The book’s many diagrams and formulae are generally presented clearly. We have in this book a widely 
based collection that is well edited and produced. These real assets easily outweigh the odd predilection for 
what an SRC official said years ago (cf. Good, pp. 172/3) or the perhaps not always wholly apt lines from 
famous poets (cf. Waldinger, pp. 94-134). All the same, the book costs a lot. Perhaps the best idea will be to 
refer to it and have libraries purchase it for our use. On the other hand, readers who are commonly and 
principally concerned with machine intelligence would be well advised to purchase MI8 for themselves. 
GODFREY HARRISON’ 


The Psychology of Place. By David Canter. SDa Architectural Press. Pp. 198. £5.95. 


Like other professions, architecture, in the late 1960s turned to other disciplines, in particular to sociology 
and psychology, for help in the design process. There was a period of courtship and infatuation. There was 
talk of multidisciplinary design teams ~ a broadening of architectural education. Canter himself called for 
psychologists to be ‘actively involved with the design process rather than merely passing comments or 
making measurements of something already built’. But in fact there was more misunderstanding as to the 
contribution each was to play. The marriage has never taken place, and with the adverse public criticism of 
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post-war architecture, and now the severe financial cutbacks in building we are seeing a withdrawal of 
architecture. There is a crisis within architecture — a confusion. 

I share David Canter's concern that the same architectural and urban forms are being reproduced 
throughout the world - the international style rules. Can we create places more appropriate to their 
inhabitants? We have only to look at the new architecture of the Middle East to see the same mistakes are 
being made — the loss of a vernacular architecture appropriate to its inhabitants, place and climate. I 
unfortunately do not see David Canter’s book being able to arouse the passion again. 

The motivation behind Canter’s book ıs his concern as to ‘how people make sense of and cope with their 
surroundings’ — a concern shared with other disciplines, notably architecture, planning and geography. The 
psychology crept in because of the need to understand the ways in which we represent ‘places’ in our 
‘heads’ Canter deals first with the theoretical origins used to explore peoples’ cognitive systems, Bartlett's 
serial reproductions, Lee’s neighbourhood lines, and Lynch's sketch maps. Lynch’s work has become the 
standard reference for architects and planners and yet it is interesting to note that Lee was working in this 
field in Britain before Lynch. In exploring the methods available to examine the cognitive systems people 
have for dealing with places Canter relies on much of the research work he carried out in the Building 
Performance Research Unit in the University of Strathclyde, particularly their appraisal of the Royal 
Hospital for Sick Children, Glasgow. I would have expected a more detailed account of the study by Goodey 
& Lee on Kingston-upon-Hull to have been included here rather than being left to a passing comment at the 
end of the book. After a review of the evidence Canter comes to zhe conclusion that it appears that people act 
as if there are ‘maps’ in their heads, not the one-inch-to-the-mile Ordnance Survey map, but certainly not 
the series of ‘pictures’ of places as is often held. Having established the concept that we carry maps in our 
heads Canter naturally follows on to look at the relationship of the patterns of distance estimation derived 
from cognitive systems and the differing perspectives which exist between people in their concept of place. 
There is the danger of mismatch between designer and user. Finally, Canter allows himself hcence as author 
to look at the role of the designer and in creating sense of place. I would agree with much of what Canter 
says about designers though I do not think he could get much agreement from architects in general. I agree 
with his observation that many architects are themselves confused and that the reduction in concern for places 
came with the ‘modern movement’. I would however say this is true of the older generation of architects. 

I think architects do not have an understanding of how society works, of how people live and how people 
want to live. The designer’s role 1s seen by Canter as the official modifier and creator of physical forms - the 
goal of the design process is the creation of places. The designer should be aware of his role. The process of 
innovation for which he has been trained will mean that often other associated disciplines will be unable to 
contribute. Canter’s ideas for generating new procedures for designing and producing places are to me 
utopian. 

Having read Canter’s book I find myself suffering from what can be termed the ‘Canter effect’ -an 
over-exposure to Canter. Canter will soon replace Kilroy. I have a fear that every ‘place’ I visit in the future 
will have ‘Canter was here’ or ‘Canter rules OK’ sprayed on the walls. 

JAMES LOWE 


Multivariate Analysis in Behavioural Research. By A. E Maxwell. London & New York: Chapman & Hall: 
Monographs on Applied Probability and Statistics 1977. Pp. 164. £3.95 


To psychologists familiar with Chapman & Hall Monographs on Applied Probability and Statistics (formerly 
Methuen) and to Professor Maxwell’s introductory statistical textbooks for behavioural scientists, it need 
only be said that this new volume maintains the very high standard of both. 

Based on a course of lectures on multivariate analysis given annually by the author to postgraduate 
students and research workers in the behavioural sciences, it offers to readers of similar background both a 
preparation for further study of the use of relevant computer packages and assistance towards the study of 
more advanced books on this topic. It succeeds well in both aims, providing a solid bridge to understanding 
of complex statistical models for students and research workers of limited mathematical background. 

Although introductory, however, the book is not elementary. It requires a working knowledge of matrix 
algebra and, for those seeking revision or instruction in this area, devotes a full chapter to necessary 
principles. The wisdom of using matrix notation in the exposition of principles and strategies of multivariate 
analysis is apparent in the resultant display of basic structures common to all multivariate techniques. This 
structural approach, assisted by the author’s lucid and scholarly exposition, clearly facilitates the 
understanding which the book aims to provide. 

This 1s further strengthened by the author’s refusal to provide details of computational techniques, except 
where necessary for understanding, or to provide examples and exercises for practice. Now that there are so 
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many statistical techniques available and a proliferation of computer packages, it 1s unreasonable to expect 
every research worker to be able to carry out the computations necessary for any specific analysis. But it is 
reasonable, indeed it is necessary, to insist that users should be fully acquainted with the assumptions and 
limitations of any technique they use so that they do not mistake or misunderstand the implications of the 
resultant analysis 

Such an approach also brings into prominence problems related to the reliability of collected data. In the 
early years of multivariate analysis, especially of factor analysis, practitioners were well aware of, and very 
concerned about, the problems posed by the application of precise models to data in which errors of 
measurement were clearly apparent. Decades of textbooks in which computation has been emphasized at the 
expense of understanding have led to a widespread neglect of this problem by psychologists. Professor 
Maxwell makes it a central issue of his book, pointing out ways of assessing the efects of measurement errors 
and underlining the dangers inherent in models which make no allowance for them and indicating the nature 
and extent of allowance made by the models which do. 

The bulk of the book is concerned with the classic techniques of multivariate analysis. After two 
introductory chapters covering the historical background and general problems of these techniques, and the 
chapter devoted to matrix algebra, chapters are devoted to principle components analysis, factor analysis 
(including confirmatory factor analysis), multiple linear regression analysis, canonical correlation, 
discriminant function analysis and canonical variate analysis, the analysis of contingency tables, univariate 
and multivariate analysis of variance. The book ends with a chapter by B. S. Everitt covering cluster analysis 
and miscellaneous techniques, including recent developments in the visual representation of multivariate data. 

The exposition of analysis of variance in matrix notation brings into sharp focus the logical structures it 
shares with regression analysis. This should help to destroy the myth among psychologists, fostered by the 
early association of regression analysis with the psychology of individual differences and of analysis of 
variance with the psychology of species specifics, that regression analysis is concerned only with survey-type 
research whilst analysis of variance is concerned with analysis of experimental data. 

In an ideal world, no psychologist would be allowed to use any computer package without a certificate to 
show that he had a working knowledge of the material in this textbook; in the present circumstances it must 
suffice to suggest that it should be recommended reading for all courses in psychological research methods 
A. B. ROYSE 


Social Exchange Theory: Its Structure and Influence in Social Psychology. By J. K. Chadwick-Jones. London. 
Academic Press. European Monographs in Social Psychology, no, 8, edited by H. Tajfel. Pp. vi+431, 
£11.80. 


This book attempts to give an account of all the work in social psychology which has been based directly on 
theories of social exchange propounded by Thibaut & Kelley, Homans and Blau, whose infidential books 
were published between 1958 and 1965. 

The author painstakingly follows chronological order as he progresses from a description of Thibaut & 
Kelley’s game theoretical approach through Homans’ deceptively simple economic propositions, to Blau’s 
scholarly examination of exchange relations in organizations. Critics of the theories are given generous 
treatment, and this gives rise to some interesting discussion of general questions of method and theory in the 
social sciences. 

One main problem with the book lies in Chadwick-Jones’ fidelity to the arms and intentions of each of 
these writers and their followers, and he is impatient of those critics who expect more from them than they 
attempted. In the end, he is happy to be able to demonstrate that, indeed, exchange theory has ‘paid off’ in 
social psychology. Exchange theory has spawned a multitude of studies, and this 1s regarded by the author 
as sufficient justification for the whole movement, despite weaknesses in the various theoretical positions As 
he states, exchange theory is based on analogies from reinforcement and economic theories. This creates 
fundamental ambiguities, as in the case of Thibaut & Kelley, who, as Chadwick-Jones rightly suggests, deny 
‘the stricter game theory postulates while at the same time they accept a general maximizing assumption 
from which logically the stricter postulates could in the end be derived’ (p. 139). 

Some interesting discussion in the concluding chapters provides two pointers to the future. Rapprochement 
is already evident between exchange theory and cognitive approaches in social psychology, as, for example, 
between Homans’ most productive notion of distributive justice and Heider’s balance theory, and in this 
connexion it may be observed that readers (especially from the USA) will be surprised by the somewhat 
scanty account of empirical work on ‘equity theory’, which has of late been exchange theory’s most 
clear-cut success. This development is, to some extent, opposed to the tendency for social exchange theory 
to assume greater importance in sociology and social anthropology The essentially individualistic approaches 
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of Homans, and of Thibaut & Kelley, take the dyad as the unit of analysis, and they both suggest that 
appropriate, balanced and essentially harmonious relationships wil prevail. Blau, on the other hand, 
emphasizes that in organized social life and in intergroup relationships, exchange has a more wayward, 
conflictful and competitive cast. At this point one regrets that this volume was apparently written too late to 
make reference to Peter Ekeh’s publication Social Exchange Thecry which centres on the distinction 
between ‘restricted’ (dyadic) and ‘generalized’ (collectively organized) exchange. The elaboration of such 
general distinctions might have helped integrate the carefully documented studies, viewpoints and arguments 
which characterize this very useful book, besides more effectively demonstrating the wider centext in which 
exchange explanations may operate. 

The author is, however, to be congratulated on producing a volume which does enable stock to be taken 
of the most important movement experimental social psychology has generated since the field theoretical and 
subsequent cognitive approaches inspired by Lewin and his followers. As such, it has no serious competitor. 
G. M. STEPHENSON 


Exen, Peter, (1974). Social Exchange Theory. 
London: Heinemann. 


New Forms of Work Organization. By Lisl Klein. London: Cambridge University Press. 1976. Pp. 106. £3.95. 


Writing in the 1966 Annual Review of Psychology, R. W. Porter remarked on the general impression he had 
got on reading through the recent literature of the time on personnel management ‘that the individual 
differences oriented investigation of traditional personnel topics tends to ignore social and organizational 
factors’. Whatever way one may account for this kind of over-centred individual/situation approach in 
psychological inquiry, one thing is clear, its persistence has been quite remarkable. However, misgivings 
have been setting in. For instance, in most topic areas of industrial psychology, since about the middle of 
the 1960s there has been a noticeable increase in evidence of more broadly based thinking. ‘Broad spectrum’ 
human relations watchers will be quick to point out here that well tefore the 1960s quite a lot of research 
had been carried out — at the Tavistock Institute, for instance - which had made clear the inadequacies of the 
conventional approach in this area. True, certainly. But such research was rather against the tide at the 
time. Now it looks as if the tide has turned with priorities being thought out again. This can be seen from the 
new labels now appearing — socio-technical rather than techno-socicl, quality of working life rather than level 
of industrial output, and so on. 

The general approach to forms of work organization in the present monograph is very much in the broadly 
based tradition one would expect from ‘the Tavvy’— the author is on the staff there. In effect, it is the 
English version of a report which followed an invitation to the author in 1970 from the West German 
Commission for Economic and Social Change, to investigate the organization and design of work in some 
countries of Western Europe. It was also requested that the author attempt to identify some of the problems 
that the introduction of new ideas in the design of work have given rise to. Generally, the auchor has held 
closely, but not rigidly, to the remit. The result is a very readable, discursive account of what was happening 
at the beginning of the 1970s in several West European countries in regard to developments ia the 
organization of work. As well, an historical dimension to some of the main developments has been very deftly 
sketched in. In a short monograph such as this, one has to accept that it is just not possible for the 
author to go in detail into all the main aspects outlined. However, some readers ~ like the present reviewer — 
would have appreciated more examination of what seem to be identifiable as problems ansing from new 
developments in this area. Nevertheless, even here the author can hardly be said to be disappointing. Several 
problem areas are pointed up, including that most intriguing one ~ the way in which job redesign ‘spoils’ or 
disrupts informal work practices which have grown out of the previous job design. 

On the whole, thus, the monograph can be seen as a short text of working philosophy in regard to job 
design in industry — this, rather than anything in the way of a manual spelling out how new ideas might be 
put into effect. But here the reader on the look out for information of this kind is not forgotten. Field 
experiments and programmes where the new ideas have been tried out are listed and brief details given. 
Finally for those who had misgivings not so many years ago about the way job design in industrial 
psychology seemed to be taken over more and more by an approach the incantatory cry of which was, 
prediction and control, the present text will be particularly heartening. 

JAMES G. McCOMISKY 
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Levels of processing: A critique 


Michael W. Eysenck 








The theoretical approach to problems of memory proposed by Craik & Lockhart (1972) is evaluated 
critically. Their conceptual framework has the advantage of directing attention to mental events and 
processes, but there are several difficulties. They suggested that retentivity 1s a function of the depth and 
spread of processing, but there are no suitable criteria available for indexing either the depth or the spread 
of encoding. Furthermore, encoding depth and spread appear to affect the retrieval component of recall, but 
are largely irrelevant to the determination of retrieval strategies and to the decision component involved in 


recall and in recognition. , 
1 








An important characteristic of contemporary approaches to human learning and memory is their 
emphasis upon the range and flexibility of encoding processes. Recent theorists who have argued 
that there is a vital distinction between the stimulus-as-presented and the stimulus-as-encoded 
include Bower (1967), Hyde & Jenkins (1969), Underwood (1969), Shulman (1970), Cermak (1972), 
Craik & Lockhart (1972) and Tulving & Thomson (1973). In view of the putative flexibility of 
input processing, it seems strange that few theorists have considered seriously the possibility 
that output processes might manifest a similar variety. The hypothesis has been proposed (e.g. 
» Reitman, 1970; Morton, 1975) that the retrieval of information from memory requires many of 
the processes involved in problem solving, but it has never been developed. This article 
examines critically the most complete theory of the encoding processes involved in learning 
(Craik & Lockhart, 1972; Craik, 1973; Craik & Tulving, 1975), and puts forward some 
suggestions about the processes involved in retrieval. 

In their initial article, Craik & Lockhart (1972) argued that perceptual analysis involves a 
hierarchy of levels or stages of analysis proceeding-from the early analysis of physical features 
to the later analysis of semantic features (cf. Treisman, 1964). Craik (1973) proposed that 


_ the memory trace ıs one product of these perceptual processes, and that trace persistence is a positive 
function of the depth of analysis. ‘Depth’ is defined in terms of the meaningfulness extracted from the 
stimulus rather than in terms of the number of analyses performed upon it. Greater depth usually implies 
more processing of the stimulus. Thus, with any one type of material it will take more time to carry out the 
further operations required for deeper levels of analysis. When material ıs held constant, processing time is a 
correlate of depth of analysis and thus of subsequent memory performance [pp. 48-50]. 


Apart from the amount of processing, Craik (1973) also identified stimulus salience or intensity 
and the compatibility of the stimulus with the analysing structures as factors leading to deeper 
processing. The reason that deep levels of processing enhance retention, according to Craik & 
Lockhart (1972), is that deep processing enables the subject to make substantial use of learned 
rules and past knowledge. 

While the main emphasis of the Craik-Lockhart model was on storage operations, Craik & 
Lockhart (1972) referred to the importance of the retrieval conditions: ‘Although the distinction 
between availability and accessibility (Tulving & Pearlstone, 1966) is a useful one, the 
effectiveness of a retrieval cue depends on its compatibility with the item’s initial encoding or, 
nore generally, the extent to which the retrieval situation reinstates the learning context’ 

}.. p. 678). In other words, deep levels of processing may or may not facilitate retention, 
‘ontingent upon the retrieval environment. 
A recent article by Craik & Tulving (1975) has extended the line of reasoning adopted by 
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Craik & Lockhart (1972) and by Craik (1973). Craik & Tulving maintained that the depth of 
encoding and the spread or elaboration of encoding within the various encoding domains are 
both important determinants of memory performance. The more features of a word, especially 
the deep semantic features, that are encoded at input, the more elaborate will be the resultant 
memory trace. Craik & Tulving argued as follows: ‘Greater degrees of integration (or, 
alternatively, greater degrees of elaboration of the target word) may support higher retention in 
the subsequent test. Effective elaboration of an encoding requires further descriptive attributes 
which are (a) salient, or applicable to the event, and (b) specify the event more (sic) uniquely’ 
(p. 282). 


The nature of the memory trace 


Two extreme views about the nature of the differences among memory traces can be identified. 
The first, deriving from Hull’s (1943) concept of habit strength and finding modern expression in 
the Norman—Wickelgren strength theory (e.g. Norman & Wickelgren, 1969; Wickelgren & 
Norman, 1966) argues that the important differences are quantitative in nature. The second view 
regards the memory trace as comprising a collection of attributes (e.g. Bower, 1967; Underwood, 
1969), and argues that the major inter-trace differences are qualitative. A notable feature of the 
Craik~Lockhart model is that it attempts to amalgamate these two approaches. On the one hand, 
use of the term ‘depth’, and the claim that depth of processing can be defined in terms of the | 
amount of meaningfulness extracted from the stimulus material suggest that traces differ 
quantitatively. On the other hand, Craik & Lockhart (1972) argued that perception involves the 
qualitatively different stages of analysis of physical or sensory features, pattern recognition, and 
the extraction of meaning, and that a major result of this perceptual analysis is the memory 
trace, 

A problem that is posed by this theoretical approach is whether or not the analysis of 
information is continuous or discontinuous. In their original formulation, Craik & Lockart (1972) 
argued that analysis was continuous; from a similar set of theoretical assumptions, Lockhart, 
Craik & Jacoby (1976) opted for discontinuous processing ‘domains’. 

The most reasonable conclusion seems to be that the quantitative and qualitative differences 
among traces have separable effects on performance. While Craik & Lockhart (1972) accepted 
this point, they failed to specify the theoretical consequences of distinguishing between 
quantitative and qualitative inter-trace differences. In addition, this theoretical imprecision places ` 
the relevance of some of the experimental evidence cited by Craik & Lockhart in doubt. For 
example, they referred with approval to the work of Hyde & Jenkins (1969) and Johnston & 
Jenkins (1971), in which subjects were required to perform one of a number of orienting tasks on 
a list of words, but were not told that they would subsequently have to recall the words. The 
orienting tasks varied in terms of their processing requirements. The most important determinant 
of the level of recall was whether the orienting task involved the consideration of the meaning of 
the list words (i.e. a semantic task). In general, semantic tasks led to considerably greater recall 
than did non-semantic tasks, allegedly due to the greater depth of processing of the former. 
However, it may be incorrect to interpret such a result in terms of qualitative differences in 
processing. As Tulving & Bower (1974) have pointed out, the data could equally well be ‘ 
explained by hypothesizing that semantic tasks merely produce a stronger memory trace than 
non-semantic tasks. In other words, the detection of quantitative variations in recall performance 
cannot be taken as direct evidence for qualitative variations in encoding. The same criticism is 
applicable to Expts IV and V reported by Craik (1973), and to several of the experiments 
reported by Craik & Tulving (1975). 

Craik (1973) and Craik & Tulving (1975) argued that they have produced qualitative differences 
in storage by means of using qualitatively different orienting tasks, and the argument is 
intuitively reasonable. However, M. C. Eysenck (in preparation) used phonemic and semantic 
orienting tasks, followed by a recognition test including homophone and synonym distractors. 





Levels of processing: A critique 159 


The results indicated that introverts had processed primarily in line with the requirements of the 
orienting task, whereas extraverts had processed both semantically and phonemically 
irrespective of the orienting task. 

The extension of the Craik-Lockhart approach by Craik & Tulving (1975) makes matters even 
more complex. They distinguished between depth of processing and spread of processing, where 
the term ‘spread’ refers to the breadth of analysis within any given level or domain of encoding. 
Both the number of attributes encoded at a particular level and the extent to which these 
attributes are integrated with or relevant to the to-be-remembered event are determinants of 
memory performance, i.e. both quantitative and qualitative inter-trace differences are important. 
While it is asserted that non-relevant spread or elaboration of processing does not facilitate 
retention, there are difficulties of prediction resulting from this formulation. For example, will 
memory performance be higher for a memory trace incorporating several partially relevant 
features or for a trace comprising a smaller number of wholly relevant features? 

It is possible to speculate on the different functions fulfilled by qualitative and quantitative 
inter-trace differences within the framework of Anderson & Bower’s (1972, 1974) theory of free 
recall and recognition. They argued that free recall involves 4 retrieval component and a decision 
component, whereas recognition involves primarily a decision component. The depth of the 
memory trace may affect its retrievability more than the decision component, whereas qualitative 
differences among traces may affect discriminability and the decision component more than 
retrievability. This possibility is explored further in a later section. 


The measurement of depth and elaboration 


The key concept in the Craik-Lockhart model is that of depth of processing, with deep levels 

of processing involving the extraction of considerable amounts of meaning from stimuli as a 
consequence of cognitive and semantic processing. In view of the vagueness with which depth is 
defined, there is the danger of using retention-test performance to provide information about the 
depth of processing, and then using the putative depth of processing to ‘explain’ the 
retention-test performance, a self-defeating exercise in circularity. 

Craik & Lockhart (1972) proposed a partial solution to this problem, suggesting that deep 
levels of analysis should take more time than shallow levels. Craik (1973) found that questions 
presumably involving shallow levels of processing (e.g. ‘Is the word in capital letters?’) could be 
answered more rapidly than questions necessitating deeper levels or processing (e.g. ‘Is the word 
a member of the following category: ——?’). Very similar results were obtained by Craik & 
Tulving (1975) in Expts I-IV. Furthermore, Shulman (1970), using a totally different paradigm, 
found that phonemic information was more rapidly encoded than semantic information. 

On the other hand, Gardiner (1974) required subjects to search for targets either containing 
particular phonemes or belonging to a semantic category, and found that the semantic-processing 
task was more rapidly performed than the phonemic-processing task. 

In order to handle the various apparently contradictory findings, Craik & Lockhart (1972) 
rejected the assumption that processing invariably proceeds from physical to semantic attributes. 
However, by allowing for flexibility in the order of processing of information, Craik & Lockhart 
reduced the value of processing time as an index of processing depth. In addition, comparisons 
of processing time for different materials are hazardous, since many factors other than 
differences in processing depth may make processing longer with one type of stimulus material 
than with another. Lockhart et al. (1976) have proposed that each level of analysis provides 
evidence which can be used either to confirm or to reject the structural description at the next 
level of analysis. They argued for this viewpoint because, ‘it stresses the notion that structural 
descriptions at any level are as much a product of expectancies and past learning as they are 
products of the current stimulus input’ (p. 78). Thus, with practice, the subject may carry out’ 
fewer operations. 

Craik & Tulving (1975) obtained various findings indicating that encoding time was not an 
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adequate index of depth of processing. For example, in their Expt. V, the subjects were given 
either a complex structural task or a simple semantic task. The structural task took longer to 
accomplish, but the deeper, semantic, task produced higher levels of recognition. In other 
words, reaction time is not a satisfactory index of depth, and the lack of an adequate 
independent measure of depth persists. 

Craik & Lockhart’s (1972) preferred paradigm for the experimental manipulation of processing 
depth was the incidental-learning paradigm, since this gives the experimenter more control over 
the subject’s information-processing activities than is the case with intentional-learning 
paradigms. With intentional learning, a major obstacle to the direct comparison of the 
memorability of information processed at different levels is the difficulty of inducing subjects to 
persevere with suboptimal learning strategies (e.g. Paivio & Yuille, 1969). Several 
incidental-learning studies (e.g. Hyde & Jenkins, 1969, 1973; Hyde, 1973; Eysenck, 1974) found 
that subjects given tasks necessitating semantic processing of the stimulus material showed 
superior free recall to subjects given tasks involving orthographic or phonemic processing. 
However, as was pointed out in the previous section, these studies have not convincingly shown 
that there are qualitative differences in processing under the various conditions. 

A methodologically superior approach was adopted by Cermak, Schnorr, Buschke & Atkinson 
(1970), who instructed different groups of subjects to remember either the meaning or the sound 
of the list words. On the subsequent recognition test, the subjects received pairs of words, each 
pair comprising one list word together with its synonym, its homophone, or an unrelated word. 
In line with a depth approach, the meaning subjects outperformed the sound subjects on all the 
recognition tests. More interestingly, there was a significant interaction between instructions and 
recognition pair type, with the meaning subjects doing better on the homophone pairs than the 
synonym pairs, whereas the sound subjects did equally well on both pair types. This interaction 
provides reasonable evidence that subjects encoded as they had been instructed. More generally, 
qualitative differences in encoding will manifest themselves in interactions between encoding 
processes and retrieval situations, whereas quantitative differences will not. 

A fundamental difficulty with the incidental-learning approach to the study of depth theory is 
that it relies upon an a priori allocation of different encoding processes to different levels of 
processing. While it may appear reasonable to consider that semantic tasks involve deeper levels 
of processing than phonemic tasks, decisions are more difficult in other cases. For example, 
pictorial stimuli are usually well recognized on a retention test, suggesting that they have 
received a deep level of processing. This deep level of processing is commonly thought of as 
involving imagery. However, one might equally well argue that imaginal processing of pictorial 
stimuli merely involves the storage of some of the physical attributes of the stimuli (i.e. the 
visual characteristics) and so represents a shallow level of processing. 

Few attempts have been made to manipulate experimentally the spread of encoding. However, 
Craik & Tulving (1975), in Expt. VII, used the technique of asking the subjects to indicate 
whether a tachistoscopically presented word fitted a given sentence. The amount of spread of 
encoding was manipulated by using sentence frames ranging from the simple (e.g. ‘He dropped 
the ———’) to the complex (e.g. ‘The old man hobbled across the room and picked up the 
valuable — from the mahogany table’). It is arguable whether the sentence frames differed 
only, or primarily, in terms of the number of semantic features or attributes they contained. It 
seems likely that the simple and complex sentences may have differed in other important 
characteristics, such as relevance to the tachistoscopically presented word and imageability. 

In the same experiment, Craik & Tulving (1975) also manipulated the relevance of the sentence 
frame to the target word by using sentence frame, target word combinations that were either 
congruous or incongruous. While the experiments initially made subjective judgements about 
sentence frame, target word congruity, there was independent evidence available in the subjects’ 
responses. They were required to indicate whether or not the target word fitted the sentence 
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frame, and their responses were consistent with the experimenters’ judgements. However, it 1s 
not clear whether a continuum ranging from congruity to incongruity can be identified, nor is it 
known whether the sentence-word combinations differed in ways other than congruity. Since 
spread of encoding has only been investigated at the phonemic and semantic levels, it is unclear 
exactly how spread of processing would be manipulated at other levels of processing. 

In sum, psychologists have available several techniques for ascertaining the nature of the 
subject’s encoding processes. These include Wickens’ (1972) release effect; Tulving’s 
retrieval-cue technique (e.g. Tulving & Pearlstone, 1966); analysis of the types of recognition-test 
errors; and analysis of the types of errors in recall (e.g. Bartlett, 1932). However, with the 
exception of the coarse dichtomy between physical and semantic analyses, no operational 
definition of processing depth is available. 

So far, spread of encoding has been manipulated primarily by varying the sentential context in 
which information is presented. Sentences probably differ from each and in their relationship 
with to-be-remembered information along more dimensions than is currently recognized. 


Levels of analysis 


Several contemporary theorists, including Craik & Lockhart (1972), have distinguished between 
physical and semantic encoding attributes or features. However, this seems to oversimplify a 
complex reality. For example, consider the work of Wickens (1970, 1972). He used the 
Brown-Peterson technique, presenting items from the same class on a number of trials, followed 
by a shift to a different class. This shift produces an improvement in recall known as ‘release 
from proactive inhibition’ (Wickens, Born & Allen, 1963). Wickens argued that the existence of 
the release effect when a particular word attribute was changed indicated that that attribute had 
been encoded by the subject. If that attribute had not been encoded, then a change in it could 
not affect performance. Methodological and interpretative problems associated with the release 
effect have been discussed by Underwood (1972) and by Eysenck (1977). 

This technique has indicated that there is a considerable variety of memorial attributes. For 
example, Turvey & Egan (1969) produced the release effect by making the size of the display area 
for the to-be-learned item smaller or larger on the shift trial than it had been on the previous 
trials. If this does, in fact, indicate that display-area size is a memorial attribute, then a red dot in 
the top left-hand corner of the screen would probably also qualify as an attribute, and so on ad 
absurdum. 

Wickens (1970, 1972) prevented the number of attributes becoming unmanageable by allocating 
the attributes to a limited number of categories: semantic, physical, and syntactic, Craik & 
Lockhart (1972) argued for a continuum of encoding, thus allowing for an infinite number of 
subtly different possible encodings, whereas Lockhart et al. (1976) proposed various qualitatively 
distinct domains, including the physical, phenemic and semantic. Such categorizations are 
based upon intuitive and subjective judgements rather than acceptable scientific criteria. For 
example, Mandler & Worden (1973) considered the identification of words as nouns or verbs to 
be a semantic processing task, whereas Hyde & Jenkins (1973) argued that judging the part of 
speech of words was a non-semantic task. 

In spite of the difficulty of finding appropriate methods of categorizing word attributes, two 
possible ways involve either the use of factor analysis or of Tulving & Bower’s (1974) reduction 
method. Osgood, Suci & Tannenbaum (1957) used factor analysis to uncover the connotative or 
affective attributes of words, and found that the three major dimensions of affective meaning 
were evaluation, potency, and activity. It is probable that other sets of word attributes would be 
found to cluster together to form a manageable number of attribute dimensions. 

The reduction method of Tulving & Bower (1974), extended by Tulving & Watkins (1975), 
involves providing subjects with two or more different retrieval cues in succession for probing 
each of the memory traces. The basic prediction is that, if the informational content of one cue 
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(cue X) is completely included in the information contained in the second (cue Y), then cue Y 
will be totally ineffective ın producing recall if cue X has failed to lead to recall. On the other 
hand, if the information content of cues X and Y does not overlap at all, then they should exert 
separate, additive, effects on recall probability. The presence of substantial or complete overlap 
between two attributes or cues would indicate terminological redundancy. 

In sum, the popularity of the distinction between physical and semantic attributes does not 
demonstrate its validity. Since the distinction has only tangential evidence in ıts favour, it seems 
prudent to reserve judgement on the most appropriate categorizations of word attributes. 


Retrieval processes 


Craik & Lockhart (1972) argued that deeper levels of processing would lead to greater trace 
persistence, with the apparently obvious deduction that retention-test performance would be a 
positive function of the depth of processing. However, while this deduction was made and 
confirmed by Craik (1973) in Expts. IV and V, Craik & Lockhart noted that the nature of the 
retrieval situation was important. Indeed, under some circumstances deep levels of processing 
might be associated with poor retention-test performance. 

Craik & Lockhart (1972), in their attempt to describe the memory trace, have focused on input 
operations such as the nature of the stimulus and the instructions presented to the subject. 
However, the greatest understanding of an intervening variable such as the memory trace is 
likely to emerge from a simultaneous consideration of input and output operations. Since any 
single measure of retention is likely to provide us with data representing an amalgam of 
memory-trace variance and test-specific variance, it would seem that the use of two or more 
retention measures is advisable. For example, if we find that a subject on two recognition tests 
can discriminate between the list word and its homophone, but not between the list word and its 
synonym, this surely is more informative than a single recognition test would be. It is surprising 
that Craik & Lockhart did not attempt to anchor their concept of trace depth more securely at 
the output side. 

The point that the memory trace can be more precisely described by a series of retention tests 
than by a single retention test has also been made by Tulving & Watkins (1975). They assumed 
that any retrieval cue is effective only to the extent that its informational contents match the 
information contained in the memory trace, an assumption known as the encoding specificity 
principle. It follows from this that the observed effectiveness of different retrieval cues can be 
taken as evidence about the characteristics of the trace. It also follows that the greater the 
variety of retrieval cues utilized, the more complete will be the resultant description of the trace. 


Experimental evidence: Depth 


Some of the experimental evidence supporting the Craik-Lockhart view of the effects of depth 
of processing on retention has already been mentioned (e.g. Hyde & Jenkins, 1969; Johnston & 
Jenkins, 1971; Craik, 1973), and additional confirmatory findings were discussed by Craik & 
Lockhart (1972). Several other studies could be cited in this context, but especially striking 
results were reported in work on sentence memory by Sachs (1967) and Johnson-Laird & 
Stevenson (1970). Sachs (1967) used intentional learning and Johnson-Laird & Stevenson (1970) 
used incidental learning in one of their conditions, but in both studies the result was that 
subjects retained the semantic features of sentences considerably better than the syntactic 
features over a short retention interval. 

While numerous studies provide general support for Craik & Lockhart, there are also several 
studies providing counter-evidence. For example, Bransford, Barclay & Franks (1972) presented 
their subjects with a sentence such as ‘The women stood on the stool and the mouse sat on the 
floor beneath it’ and found on a subsequent recognition test that subjects could not discriminate 
between the original sentence and a similarly worded inference (e.g. ‘The woman stood on the 
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stool and the mouse sat on the floor beneath her’). An implication of the Bransford work is that 
deep levels of processing may involve the processing of non-stimulus material and this 
elaborative coding may frequently lower retention-test performance. Greater depth will only 
yield superior retention when the distractor items do not incorporate information utilized in the 
initial elaborative encoding. 

Several considerations indicate that memory traces formed as the result of relatively shallow 
processing can be extremely durable. For example, we can frequently recognize a telephone 
caller merely on the basis of tone of voice, even though we may not have heard the voice for a 
long period of time. Furthermore, the fact that most adults can read aloud rapidly from a book 
or newspaper indicates that very long-term storage of phonemic information is possible. Of 
course, it could always be argued that these activities involve relatively deep processing, 
including semantic abstraction from the stimulus, but the issue cannot be settled on the basis of 
the available evidence. 

At an experimental level, Kolers & Ostry (1974) presented numerous sentences visually to 
their subjects, half in normal orientation and half inverted (the sentences, not the subjects’, 
followed by a recognition test at a retention interval of between three and 32 days. The subjects 
were initially instructed that the experiment was a study of reading. While semantic information 
was better retained than typographical, or graphemic, information, the finding of most interest 
was that some information about typography was retained after 32 days. Since encoding of 
graphemic information is an example of shallow processing, the longevity of such information 
appears to be inconsistent with the Craik-Lockhart formulation. However, as in most studies in 
this area, the precise nature of the encoding processes is unclear. Since inverted sentences took 
longer to read and presumably required greater effort than normal sentences, subjects may have 
retained information about the effort expended ın reading each sentence rather than about its 
typography. 

Bregman (1968) investigated the relative memorability of information varying in depth. He 
presented a series of nouns interspersed with cued recall tests, and found that phonemic, 
graphic and semantic cues were all equally effective. 

Jacoby (1975) manipulated the study encoding of a list of words so as to emphasize either the 
physical (i.e. sound and spelling) or the semantic word attributes. Subsequent testing confirmed 
that the experimental manipulation had been successful. A recognition test indicated that 
physical information was retained over the long term as well as semantic information. 

In general, the effects of encoding depth appear to be greater on tests of recall than on tests of 
recognition. In addition to the results from the recognition studies of Jacoby (1975) and Kolers & 
Ostry (1974), there is the work of Buschke & Lenon (1969). They required their subjects to learn 
a list of words, followed by a forced-choice recognition test involving unrelated word pairs, 
homophone pairs, and synonym pairs. Performance on the homophone and synonym pairs was 
significantly inferior to that on the unrelated pairs, indicating that phonemic and semantic 
information had been stored. More importantly, performance on the homophone and synonym 
pairs was equivalent, suggesting comparable long-term storage of the phonemic and semantic 
word attributes. The small effects of processing depth on recognition-test performance are 
consistent with the hypothesis that depth affects the retrievability of information more than the 
decision or recognition process based upon retrieved or presented information. 

An additional inconsistency between depth theory and the experimental evidence has been 
noted by Craik & Tulving (1975). They gave the subjects in their Expt. VII a variety of orienting 
tasks all calling for semantic analysis of the target words. They found considerable differences in 
free recall as a function of orienting task, the percentage correct ranging from 20 per cent to 
over 80 per cent under different conditions. In essence, semantic tasks involving information that 
was complex and congruous with the to-be-remembered words produced better recall than 
semantic tasks involving simple, incongruous information. Such findings were used by Craik & 
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Tulving as evidence that retentivity is a function both of the depth to which information has 
been processed and of the spread or elaboration of encoding at any given level. However, the 
validity of the crucial assumption that the same deep level of semantic processing was used in 
connexion with all the tasks is difficult to assess in the absence of a satisfactory criterion of 
processing depth. 

Further evidence that tasks involving comparable deep level processing may produce significant 
differences in recall performance was obtained by Barclay et al. (1974). Their subjects learned 
sentences in which particular semantic features of key words were emphasized. For example, 
the weight of pianos would be emphasized by the sentence, ‘The workmen lifted the piano with 
difficulty’, whereas the fact that pianos are musical instruments would be stressed in the 
sentence, ‘The musician played the piano with skill’. Cued recall of the target noun was higher 
when the cue referred to the semantic aspect of the noun emphasized in the sentence than when 
it did not. Thus, the experimental evidence indicates that memory performance is affected not 
only by the depth of processing but also by the amount of processing (Craik & Tulving, 1975) 
and the nature of the processing (Barclay et al. 1974) at any given level. 


Experimental evidence: Spread 


Some overlap between this section and the previous one is inevitable, since Craik & Tulving 
(1975) have regarded some of the experimental evidence that is inconsistent with the depth 
hypothesis as evidence for spread of encoding. For example, Craik & Tulving reported ten 
experiments, and the strongest evidence for facilitatory effects of spread of encoding came from 
Expt. VII, discussed in the previous section. Additional supporting evidence has been obtained 
in studies using imagery instructions. Bower (1970) used a paired-associate learning task, and 
instructed his subjects either to form interactive or non-interactive images of each pair. In spite 
of the likelihood that the same depth of imaginal processing was used under both instructional 
sets, interactive imagery produced much higher levels of recall. This may have been due to 
differences in the spread of encoding at the imaginal level. Similar results with a free recall 
paradigm have been obtained by Morris & Stevens (1974), who found that subjects asked to 
form unitized images of groups of words recalled considerably more than subjects asked to form 
separate images of each word. 

A rather different paradigm was used by Frase & Kammann (1974). Their subjects determined 
whether words belonged to a designated category, with different subjects using categories of 
varying degrees of specificity (e.g. foods versus vegetables). The main finding was that 
performance on a subsequent, unanticipated, test of free recall was higher where more specific 
categories had been used, presumably because a greater spread of processing at the semantic 
level was necessitated by the more specific categories. 

In spite of the supporting evidence, there are experimental situations in which spread of 
encoding does not enhance memory performance. The work of Bartlett (1932) and of Bransford 
et al. (1972) shows that semantic elaboration of presented stimulus material can have a 
detrimental effect on both recall and recognition. The crucial factor appears to be whether the 
subject at the time of the retention test can discriminate between the stimulus material presented 
to him at input and his own additional elaborative encoding. In the case of the work of Craik & 
Tulving (1975), the subjects had to make the relatively straightforward discrimination between 
the contextual sentences and the target nouns. Morris & Stevens’ (1974) subjects had no 
problems of discrimination, since the spread of encoding merely involved forming relationships 
among to-be-recalled items, and Bower’s (1970) subjects were presented with half of each 
interactive image and asked to supply the other half. On the other hand, Bransford et al.’s (1972) 
subjects had to make the difficult discrimination between presented sentences and very similarly 
worded sentences of closely related meaning. In sum, elaborative encoding can have detrimental 
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effects if (a) the distractors are highly similar to the stored trace, or (b) the retrieval cues are 
incompatible with the encoded unit. 

A further factor limiting the extent to which depth and spread of encoding are predictive of 
retention-test performance was investigated by Craik & Tulving (1975). They argued that the use 
of similar encoding operations for several different target words might have detrimental effects 
on memory performance through an overloading of retrieval cues. In their Expt. VIII, they 
varied the number of words exposed to physical, phonemic, and semantic processing, and found 
that the proportion of phonemically encoded words recognized was inversely related to the 
number of words receiving phonemic processing. In order for elaborative processing to facilitate 
retention-test performance, the resultant memory traces must be discriminable from other, 
related, memory traces. 


Is depth a crucial variable? 


With some exceptions that have been mentioned in previous sections, there is reasonable 
evidence that deep levels of processing do seem to produce greater retentivity than shallow 
levels of processing. However, while this result seems intuitively obvious, it is actually quite 
difficult to account for it satisfactorily. Craik & Lockhart suggested that deep processing allows 
the subject to make substantial use of learned rules and past knowledge. The two forms of 
encoding most frequently compared are phonemic and semantic, and so our discussion will 
consider alternatives to the assumption that differences in retention of the two forms of encoding 
are due to the fact that semantic processing is deeper than phonemic processing. 

One possibility is that phonemic information is more poorly retained than semantic 
information because it is exposed to more highly similar interfering information during the 
retention interval. This possibility becomes more plausible if we assume that interference is a 
function of similarity, and that there is a much greater variety of possible semantic encodings 
than of phonemes. The latter assumption seems likely in view of the limited number of 
phonemes in the English language. 

A second alternative interpretation assumes that the successful recall of words in most 
laboratory experiments depends heavily upon contextual tagging of the list words at input. Since 
all the words presented to the subject are known by him, he can only subsequently discriminate 
between the to-be-remembered words and the other words he knows on the basis of ancillary 
information, or contextual tags, stored with each word. Jacoby (1974) has argued, with 
supporting evidence, that phonemic encoding is relatively invariant across different situations, 
whereas semantic encoding is context dependent. In other words, the semantic encoding of a 
given word in a given situation is different from the semantic encoding of the same word in a 
different situation, and so one semantic encoding is discriminable from prior encodings of the 
same word. This trace discriminability may be lacking in the case of phonemic encoding. 

A third interpretation of the results is that subjects spontaneously attempt to generate 
semantic rather than phonemic retrieval cues when engaged in list recall. It is probably the case 
that everyday life far more frequently requires the retrieval of semantic than of phonemic 
information. It is essential to note that these three interpretations emphasize quite different 
variables to the depth-of-processing approach, and have not been compared systematically with 
it in the research literature. 

It is noteworthy that the recent development of the levels approach has seen an extension of 
the theoretical concepts utilized. Craik (in press) argued, with supporting evidence, for the 
importance of four determinants of memory performance: depth of processing; elaboration of 
encoding; congruity between an event and its encoding context; and uniqueness of the link 
between retrieval information and the encoded event. It is probable that phonemic and semantic 
processing typically differ with respect to most (or all) of these factors. 
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Summary and conclusions 


The depth approach is part of a general, and largely welcome, shift of interest in memory 
research from external, or situational, stimuli to internal mental events and processes. The 
contrast between the two approaches is well expressed by Craik & Tulving (1975): 


In the experiments we have described here, these important determinants of the strength of associations and 
traces were held constant: nominal identity of items, pre-experimental associations among items, intralist 
similarity, frequency, recency, instructions to ‘learn’ the materials, the amount and duration of interpolated 
activity. The only thing that was manipulated was the mental activity of the Jearner, yet, as the results 
showed, memory performance was dramatically affected by these activities [p. 292]. 


In spite of the valuable contribution made by Craik and his associates, it is apparent that some 
modifications to the theory are necessary. It is probable that much more attention should be 
given to the processes involved in output. Let us start by assuming that the recall of information 
from memory involves the following five stages or processes, which occur in serial fashion: 

(1) Presentation of a retrieval cue (e.g. instructions to recall; presentation of context). 

(2) Rule or strategy formulation: the subject decides how the subsequent search process 
should operate, the size of the search set to be used, and so on. 

(3) Item search: the search process specified by the chosen rule or strategy is implemented 
and produces one or more items. 

(4) Evaluation: the products of the search process are evaluated against both the current rule 
or strategy and the experimental requirements. 

(5) Emission: those responses positively evaluated are emitted (i.e. spoken or written down). 
[It should be noted that while the processes are assumed to occur serially, information obtained 
from the retrieval and evaluation stages (e.g. about failures and repetitions) may lead to the 
formulation of new rules and strategies.] 

The main novelty of this proposed five-stage process is the second stage of rule or strategy 
formulation, which antedates the search for specific items. Tests of thinking and intelligence 
frequently depend heavily upon suitable rule formulation for their successful completion. For 
example, Bartlett (1958) used a test in which subjects were given the following: ‘A 
BY...HORRIBLE’. The difficult part of this problem is the formulation of the two rules that 
each successive word should start with the next letter in the alphabet, and that each successive 
word should comprise one more letter than the immediately preceding word. When these rules 
have been discovered, the retrieval of appropriate words to fill the gap is relatively simple. 

While intelligence-test items usually emphasize the rule-formulation stage rather than the 
item-retrieval stage, the opposite tends to be the case in experiments on memory. Indeed, in 
many cases the rule or strategy employed is merely to retrieve the memory trace or traces most 
strongly associated with the experimental context. However, some memory paradigms allow 
more scope for the use of retrieval strategies. For example, verbal fluency tasks which require 
subjects to retrieve spontaneously as many items belonging to a specified category as possible 
(e.g. four-footed animals) can be approached in a variety of ways. Morton (1975) asked his 
subjects to retrieve colour names, and found evidence for two rather different rules or strategies. 
Some subjects tended to recall the more familiar colour names, and appeared to be producing 
strong associates of the word ‘colour’. Other subjects seemed to use ‘scientific’ knowledge, 
exemplified by their recall of the colours of the spectrum. These latter subjects did not show any 
great tendency to recall familiar colour names, presumably because they were basing their 
retrieval upon a general rule or strategy. The implication is that what is recalled is determined 
not only by the strength of representation in memory but also by the retrieval strategy adopted. 
More specifically, the relationship between encoding depth and probability of recall will be 
closer when a relatively ‘automatic’, undirected search process is used than when a more 
complex and directed search rule is adopted. 
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The importance of rule formulation can also be seen in some studies of episodic memory. 
Santa, Ruskin, Snittjer & Baker (1975) presented in random order the words belonging to four 
conceptual hierarchies. At recall, two groups of subjects received the same set of cue words, 
either in such a way as to make evident the hierarchical structure of the input list or randomly 
arranged. There was a significant difference in recall between the two groups, with the 
superiority of the hierarchical subjects presumably being due to their ability to formulate an 
appropriate rule or strategy. 

A reasonable hypothesis is that the depth of processing primarily affects the stage of item 
search or retrieval, rather than the stages of rule formulation or evaluation. An item may be 
strongly represented in memory, but may fail to be recalled because it is not included in the 
memory search set. This would explain Morton’s (1975) finding that some subjects recalled the 
uncommon word ‘indigo’ but not the common word ‘brown’. The proposal that depth of 
processing mainly affects retrieval can also be considered in the light of those studies employing 
retention tests that minimize the importance of the search process (e.g. recognition tests and 
certain cued-recall tests). Several such studies obtaining small or no differences in retention 
between physical and semantic information have been discussed already (Bregman, 1968; 
Buschke & Lenon, 1969; Kolers & Ostry, 1974; Jacoby, 1975). In addition, Nelson & Brooks 
(1974) found that rhymes and synonyms were equally effective as retrieval cues, and Light 
(1972), using quasi-recognition tests, found that homophones were actually better retrieval cues 
than were synonyms. Furthermore, one of the experiments reported by Lockhart et al. (1976) 
found no effect of orienting task on recognition, although encoding depth did have substantial 
effects on subsequent recall of the same information. 

The position is complicated by other findings, such as those of Craik & Tulving (1975). In 
several experiments they obtained large effects of depth on recognition memory. In their 
experiments, there was no systematic relationship between the target items and the distractors, 
which may have reduced the importance of the evaluation stage. An alternative theoretical 
approach was favoured by Lockhart et al. (1976), who distinguished between two retrieval 
modes: reconstruction and scanning. Retrieval information can either be used as the basis for 
reconstructing an encoding of the original event, or recent episodic traces are scanned for the 
presence of some salient feature (cf. visual search). They argued that scanning was more likely 
to be used at short retention intervals, and suggested that ‘no coding differences are found when 
subjects use the scanning or selector retrieval strategy, but that coding differences emerge when 
the reconstruction strategy is used’ (p. 88). While short retention intervals are thought to 
enhance the probability of the scanning strategy being used, it is as yet unclear exactly what are 
the determinants of strategy selection. 

While encoding depth may mainly affect search processes, there is some evidence that it also 
influences evaluation or the response criterion. For example, it has been found (Ingleby, 1973) 
that the response criterion is lower for common than for uncommon names presented in a story. 
Schwartz (1975) argued that the reason for this was that deep levels of processing include 
gaining access to information about what constitutes a rare or a common name, and that this 
information biases the respondent towards common names. In a similar experiment, but 
including conditions designed to reduce semantic processing via the use of white noise, Schwartz 
(1974) found that the main effect of noise was to equalize the response criteria for common and 
rare names. Schwartz (1975) also compared recognition performance of grammatical sentences, 
anomalous sentences, and random-word strings, where the deepest level of encoding presumably 
occurred with the grammatical sentences. A signal-detection analysis indicated that the subjects 
employed a riskier criterion for sentences than for either anomalous sentences or random-word 
strings. It thus appears that semantic processing has two separable effects: (1) a semantic code is 
added to the episodic memory trace (i.e. there is increased encoding depth and spread); (2) a 
bias towards certain, but not other, responses is established. 
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The most striking results in support of the depth approach to memory have been obtained by 
Hyde & Jenkins (1969), Craik (1973) and Craik & Tulving (1975) in situations that emphasized 
the search stage of recall. The use of list words which either belonged to a small number of 
common categories or were selected at random would allow little scope for the development of 
complex search strategies. The utilization of retention tests not requiring the discrimination of 
test items from highly similar distractors would limit the importance of the evaluation or decision 
stage. Finally, the preference for using recall rather than recognition tests would necessarily 


enhance the importance of item searches. 


In fine, while Craik and his associates have demonstrated the importance of the depth 
variable, it is still very vaguely specified, and other factors besides depth are of equal 
importance. Although it seems probable that encoding depth importantly determines the nature 
of the stored trace, retention-test performance is affected by several additional variables, 
including the nature of the retrieval cue, the search strategy utilized, discriminability of relevant 
and irrelevant information, and response biases towards and away from certain classes of 


responses. 
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Levels of processing: A reply to Eysenck 


Robert S. Lockhart and Fergus I. M. Craik 


These comments take up the major issues raised in Eysenck’s (1978) critique of Craik & Lockhart (1972): 
The problem of circularity in the definition of ‘depth’, the distinction between qualitative and quantitative 
differences in encoding, and the relationship between the concepts of depth, strength and elaboration. It is 
claimed that despite certain weaknesses, the original formulation possesses considerable heuristic value. 








Craik & Lockhart (1972) proposed that human memory might be understood in terms of the level 
or depth in the cognitive system to which stimuli are processed. Within this framework, it was 
suggested that deeply processed stimuli are well remembered, where greater depth implied 
greater degrees of semantic analysis and enrichment. These notions stemmed from a variety of 
sources including Treisman’s (1964, 1969) theory of attention, and are consistent with a large 
number of empirical findings (e.g. Hyde & Jenkins, 1969, 1973; Craik & Tulving, 1975). There 
were Clearly deficiencies in the initial statement, however, and Eysenck (1978) discusses a 
number of them. Similar points have also been made by others, including Nelson (1977). The 
purpose of these comments is not so much to rebut Eysenck’s criticisms as to clarify a number 
of the issues raised in his critique. 

One major source of misunderstanding has concerned the nature and purpose of our 1972 
article; the underlying intent of that formulation was not to offer a theory of memory that was 
subject to direct empirical test in the hypothetico-deductive tradition. Rather, and as the subtitle 
of the paper stated, the major purpose was to present arguments in favour of a new framework 
for research. At that time the typical experiment in human memory consisted of presenting 
material to a subject with instructions to ‘study it’ for the purpose of a subsequent test of 
memory. Independent variables such as the nature of the material or the conditions of its 
presentation and testing were typically manipulated. The function of a theory of memory was to 
provide an account of what a subject does when confronted with this conjunction of 
circumstances, to predict the consequences of these actions for later tests of retention, and to 
infer properties of the structure of the memory system. The most explicit and the most 
successful theory of this kind was that of Atkinson & Shiffrin (1968). 

Craik & Lockhart argued that a more fruitful approach was to consider the memory trace as 
the by-product of ‘ordinary’ cognitive-perceptual operations. According to this view, a theory of 
memory is properly seen as one aspect of a general theory of cognitive processing — that aspect 
concerned with providing a systematic account of the traces of such operations, their durability, 
and subsequent retrievability. The appropriate aim of experimentation is thus to reproduce or 
model these operations under laboratory conditions in order to study their memorial 
consequences. The relevant data base for a theory of memory is then constituted by 
well-documented relationships between these operations and the conditions of retrieval. Some of 
Eysenck’s criticisms will be considered within this general context. 


The problem of circularity 

As Eysenck correctly points out, one major limitation on the concept of ‘depth’ or ‘levels’ of 
processing is the absence of an independent index of depth. In the absence of such an index 
there is an obvious danger of circularity in the descriptive logic; it is all too easy to conclude 
that if an event is well remembered it must therefore have been processed deeply. The point has 
been made forcefully in a recent article by Nelson: 


ry 
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So far, the only ordering for depth of processing has been circular, with the various kinds of processing 
being ordered in terms of their effect on memory. With such an ordering, 1t is impossible to falsify (and, 
hence, impossible to test) the principle that deeper processing facilitates memory. Until such falsification 
becomes possible, statements about such-and-such a result being in accord with the deeper-processing 
principle are scientifically meaningless (Nelson, 1977, p. 165). 


Although gross differences in types of processing can be classified intuitively, intuition is a 
notoriously unreliable guide to the organization of mental processes. One attempt to equate 
decision latency with depth was shown to be unsatisfactory (Craik & Tulving, 1975). What then 
is the value of the concept of depth if it cannot be ascertained before the memory test is carried 
out, and if the retention level cannot be accurately predicted from a knowledge of the encoding 
operations? 

Our position is to concede immediately that circularity is inherent, at present, in the levels of 
processing approach, but to argue that the presence of circularity and the consequent lack of 
predictive power, by no means render the ideas scientifically valueless. Given our very sketchy 
knowledge of how cognitive processes operate, it seems to us that of the two traditional goals of 
science — prediction and understanding (Toulmin, 1961) ~ the latter should be strongly 
emphasized at present. Theorists are still searching for fruitful ways to conceptualize memory 
processes — should we describe memory in terms of differences in strength? as bundles of 
attributes or features? as a skilled act? as a search process? as repetition of mental operations? 
In view of this uncertainty and lack of theoretical agreement. an idea is likely to be helpful to the 
extent that it brings a measure of coherence to the data and provides firm guidance on the kinds 
of relationships that are important to study, and on the kind of data that shouldbe collected. 

Some examples may serve to illustrate the point. The Theory of Evolution cannot be used to 
predict future developments in the natural world with any degree of rigour, far less than with a 
Spearman rank-order correlation approaching +1-0 (the criterion of acceptability advocated by 
Nelson, 1977), yet Darwin’s theory has proved quite helpful in understanding living organisms 
and their interrelations. Within psychology, an event is said to be reinforcing to the extent that it 
increases the probability of a response — there is no independent index of the effectiveness of a 
Teinforcer, yet again the concept of reinforcement has guided theory and data collection. Within 
cognition, the notions of schemata (Bartlett, 1932) cell-assemblies and phase sequences (Hebb, 
1949) are again non-predictive and non-verifiable, yet have been tremendously influential and 
helpful to subsequent workers. In a similar sense, then, we argue that the concept of ‘depth of 
processing’ is not a fixed entity to be tested experimentally — it would be missing the point 
entirely to set out to prove that ‘levels of processing is wrong’ — but is an attempt to represent 
the relationships between cognitive functions in a way that makes sense of the data and that can 
be modified as the data demand. 

To amplify this last point; we strongly believe that theoretical constructs must be sensitive to 
the empirical data, even although they may not predict new data with accuracy. The role of 
theorizing, in this context, is to provide a coherent description of patterns of data and to suggest 
further empirical work. To the extent that the new data fail to conform to the theoretical 
description, the theory itself must be modified. In this sense we would argue that theoretical 
ideas formed within the original framework suggested by Craik & Lockhart (1972) have indeed 
been modified by the subsequent empirical work suggested by that framework (e.g. Craik & 
Tulving, 1975; Lockhart, Craik & Jacoby, 1976; Moscovitch & Craik, 1976; Craik, 1977; Fisher 
& Craik, 1977). 


Quantitative and qualitative differences 
A set of issues closely related to the criticism of circularity is that concerning quantitative and 


qualitative trace differences, the continuity or discontinuity of ‘levels’, and the questions of 
whether depth is merely another term, synonymous with strength. The qualitative—-quantitative 
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issue is an important one although when posed as a sharp dichotomy it becomes something of a 
red herring. Different forms, or levels of processing may be qualitatively different, yet at the 
same time it may be quite valid to compare some aspect of them quantitatively. Objects or 
events are quantitatively different only in terms of some numerical representation of 
relationships amongst the objects; the objects themselves may be qualitatively different, in the 
sense that they possess other properties that preclude producing one by quantitative variation of 
the other. Apples and oranges may differ in weight, sweetness and judged preference, but one 
cannot concatenate apples to produce an orange. 

This argument may be summarized by the statement that within the context of levels of 
processing ideas, the quantitative and qualitative aspects refer to two distinct levels of 
explanation. Greater depth of processing may correlate with higher levels of retention, and in 
that limited sense reflects a quantitative change. However, the critical point is that the qualitative 
nature of the encoded trace is held to be different at different depths. Thus ‘depth’ differs from 
‘strength’ in that depth does not refer to more of the same thing, but refers to qualitatively 
different encodings. 

However, Eysenck is correct in pointing out that a claim for qualitative differences must be 
supported empirically: ‘the detection of quantitative variations in recall performance cannot be 
taken as direct evidence for qualitative variations in encoding (Eysenck, 1978)’. The most 
important line of evidence in favour of a notion of qualitative differences is that concerning the 
interactions of encoding and retrieval conditions. It was an acknowledged weakness of Craik & 
Lockhart’s position that retrieval processes were largely ignored. It is now clear that this 
deficiency was not merely a matter of incompleteness, but one which greatly reduced the 
strength of our argument, since only when retrieval conditions are considered does it become 
apparent that the effect of level of initial encoding is not a simple additive one -is not ‘more of 
the same thing’ -and that the qualitative aspects of initial processing must be considered. On 
this point we are in complete agreement with Eysenck (1978). There is now substantial 
evidence, however, to support the statement that the effectiveness of a retrieval cue depends on 
the qualitative nature of the encoding (Tulving & Osler, 1968; Tulving & Thomson, 1973; Fisher 
& Craik, 1977). Put another way, the same depth of encoding may be associated with quite 
different levels of retention depending on the type of retrieval cue used. It is difficult to see why 
this should be so if the effect of the level of initial processing is merely to produce variations on 
a single dimension such as trace strength. Further evidence for qualitative differences at different 
levels of processing, is the finding of different types of recognition errors depending on the type 
of encoding induced; more homophone errors were made following rhyme encoding, and more 
synonym errors were made following categorical encoding (Davies & Cubbage, 1976). 

The conclusion, then, is that depth is not synonomous with strength, since depth refers to 
qualitative differences. There is now good evidence to support the existence of such qualitative 
differences between memory traces. We agree with Eysenck (and Tulving, 1974) that a 
description of the interactions between input and output processing is essential to any complete 
theory of memory. 


‘Depth’ and ‘spread’ of encoding 

Whereas Craik & Lockhart (1972) emphasized differences in depth of encoding, a later article by 
Craik & Tulving (1975) put much stress on ‘spread of encoding’ or ‘further elaboration within an 
encoding domain’ (Lockhart et al. 1976). To clarify usage of these terms; by ‘depth’ 

we mean qualitative differences as outlined above, by ‘spread’ or ‘elaboration’ we mean more 
extensive processing of the same general type. Both Eysenck (1978) and Nelson (1977) rightly 
point out that the terms are vague and lack both operational definitions and independent indices. 
Also, it is admittedly unclear how much further ‘elaboration’ is equal to a given increase in 
‘depth’. These deficiencies present problems to be clarified by further experimental and 
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theoretical analysis. Meanwhile, we agree strongly with Eysenck’s conclusion (1978) that 

‘the experimental evidence indicates that memory performance is affected not only by the depth 
of processing but also by the amount of processing and the nature of the processing at any given 
level’. 


Counter-evidence to depth and spread of encoding 


Having stressed above that since the ‘levels of processing’ notions constitute an approach to the 
study of memory, and as such cannot be ‘proved wrong’, we shall limit ourselves in this section 
to comments on why the examples cited by Eysenck are not detrimental to our view. First, the 
finding by Bransford, Barclay & Franks (1972) that in a recognition test, subjects could not 
discriminate between two very similar sentences merely stresses that other factors beside 
encoding variables are important determinants of performance; clearly the similarity of lures is 
one such factor. Second, we have no wish to equate ‘phonemic’ or even ‘sensory’ with 
‘shallow’; sound qualities such as a person’s voice or a passage of music may be processed in a 
manner that entails considerable analysis of meaning. Thus, as particular sensory events become 
well learned and associated with the co-occurrence of other events, with implications, and 
outcomes, the encoded traces of such sensory events would gradually be transformed from 
shallow to deep representations in our terminology. 


Conclusions 


Although we have resisted many of Eysenck’s criticisms, we are in broad agreement with his 
major conclusion that the notion of depth of processing by itself is insufficient to give an 
adequate characterization of memory processes. Other factors such as elaboration or 
extensiveness of processing, the nature and amount of interference, the type of retrieval cue 
provided and how it in turn is processed by the system, are clearly important too. We share 
Eysenck’s view that notions of trace discriminability may become an important alternative 
formulation to that of depth. Indeed, differences in uinqueness or discriminability of traces may 
provide an underlying reason for the memorial consequences of different levels of processing 
(Nelson, Wheeler, Borden & Brooks; 1974; Jacoby, 1974; Klein & Saltz, 1976; Moscovitch & 
Craik, 1976). 

In conclusion, we would like to stress the heuristic value of our original statement. 
Experimenta! psychology generally, and the study of human memory in particular, has suffered 
greatly from theories and models that serve to focus research effort on details peculiar to their 
own formulation and that generate data that have no meaning or interest apart from the theory. 
The alternative is not a simple empiricism since one needs to know which questions are worth 
asking. What is needed, and what Craik & Lockhart (1972) sought to provide, is a framework for 
research that steers a course between an unstructured empiricism that assigns equal value to all 
data (simply because they are data) and a theory that is so limited in scope that it explains no 
more than the experimental paradigm used to test it and which, when finally discarded (or 
simply forgotten) leaves behind no worthwhile data base to serve as a stepping stone for the 
construction of a better theory. 

The fundamental question then, is whether or not the research framework we have advocated 
is one which generates data that will outlast any specific theory constructed to account for them. 
We believe such to be the case Indeed, it is the characteristic of ‘good’ data that they highlight 
the inadequacies of a specific theoretical formulation and indicate the ways in which it needs to 
be modified. Given our current level of understanding of human memory, one can only be 
suspicious of theories that seem to withstand all experimental attack, and emerge unscathed and 
unmodified. Progress in understanding requires a sound data base that can serve as a foundation 
for a critical, yet flexible, approach to theory. 
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Levels of processing: A reply to Lockhart and Craik 


Michael W. Eysenck 


Lockhart & Craik (1978) argue that they have proposed a framework for the study of memory 
rather than a testable theory, and thus they cannot be proved wrong. In addition, they claim that 
understanding is at present preferable to prediction. My reactions to this ex cathedra statement 
are mixed. My reading of the Craik~Lockhart formulation was that some of the statements in it 
were potentially falsifiable and others were not. The general notion that the memory trace should 
be regarded as the product of cognitive-perceptual operations is probably not directly 
susceptible to empirical test. However, the hypothesis that Type I processing (i.e. repetition of 
analyses already carried out) should not enhance memory performance could be, and has been, 
tested. It is probably true that psychologists attach undue importance to theories making 
predictions that are consistently confirmed, even if the predictions are utterly trivial. It is worth 
remembering that it was well known at the time that Newton published his Principia that his 
theory could not even explain the motion of the moon. Nevertheless, the enhanced 
understanding produced by Newtonian theory ensured its continued acceptance. 

Several of the points made by Eysenck (1978) are not taken up by Lockhart & Craik (1978). 
For example, they favour a paradigm in which subjects perform various orienting tasks followed 
by an unexpected retention test, because of the degree of control over the subject’s processing 
afforded. However, an examination of the experimental evidence clearly indicates that the 
degree of control is less than total, and that subjects typically engage in a considerable amount 
of processing extraneous to that necessitated by the orienting task. For example, consider the 
Davies & Cubbage (1976) study cited with approval by Lockhart & Craik. If the orienting”task 
technique used by them had been completely effective, then subjects given a rhyme-orienting 
task would have performed at chance level when required to discriminate between the list word 
and its homophone. In fact, the result clearly indicated that some non-phonemic information 
must have been encoded. An implication is that interpretation of data obtained by means of the 
orienting-task technique is complicated by the fact that every orienting task produces 
encodings comprising an amalgam of several disparate features or attributes. 

Lockhart & Craik (1978) appear to subscribe to the encoding specificity principle, according 
to which successful retention depends on a match between trace and retrieval-cue information. 
As Eysenck (1978) noted, this allows for the possibility that deep levels of processing might be 
associated with extremely poor retention-test performance in the presence of certain retrieval 
cues. Recent evidence obtained by Morris, Bransford & Franks (1977) is entirely consistent 
with this theoretical framework. While semantic encodings were better recognized than 
phonemic encodings on a standard recognition test, the opposite result was obtained with 
rhyme recognition tests, where the targets were words which rhymed with the original study 
words, but had not been presented during acquisition. 

It is not clear that the findings of Morris et al. (1977) can be reconciled with Craik & 
Lockhart’s (1972) claim that ‘trace persistence is a function of depth of analysis, with deeper 
levels of analysis associated with more elaborate, longer lasting, and stronger traces’ (p. 675). 
Apart from the difficulty of resolving the apparent discrepancy between theory and data, there is 
a further problem. In terms of the encoding specificity principle, the effectiveness of different 
encoding operations can only be measured in terms of specified retrieval conditions. 
Accordingly, the hypothesis that deeper levels of analysis lead to stronger traces is logically 
inconsistent with the encoding specificity principle, since the hypothesis contains no reference to 
the conditions of retrieval. 
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Finally, we come to the problem of explaining the usual memorial superiority associated with 
deep levels of analysis. Lockhart & Craik (1978) and Eysenck (1978) are agreed that a plausible 
interpretation is that deep encodings are more likely than shallow encodings to be distinctive. In 
recent work, I have investigated recognition memory for words presented on an acquisition trial, 
where some of the words had also been presented on a pre-exposure trial, and some had not. 
The assumption was that pre-exposure would reduce recognition-memory performance to the 
extent that the pre-exposure encoding of a word overlapped its study-trial encoding. Since 
pre-exposure substantially reduced recognition performance for phonemically encoded words, 
but had no effect for semantically encoded words, it was concluded that semantic encodings are 
more distinctive than phonemic encodings. Let us hope that future research will enlighten us as 
to whether or not there is an inherent circularity in the concept of ‘distinctiveness’, and as to - 
how the distinctiveness approach can be integrated with the encoding specificity principle. 
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Evaluating the worth of gambles 


W. R. Crozier 








The expected value model which includes only the objective pay-offs and probabilities correlates highly with 
subjects’ bids for gambles but not with decision-making behaviour when other dependent variables are 
considered. The correlation is affected by changes in the task which emphasize the riskiness of the gambles 
It is argued that these results can be accounted for by assuming that subjects formulate simple decision rules 
when confronted with a difficult, computational task. 


Subjects in decision-making experiments are frequently faced with the task of making some 
evaluation of the worth to them of a gamble of the form: Win amount A with probability P or 
lose amount B with probability Q. Generally P= 1—Q, although in the case of duplex gambles 
introduced by Slovic & Lichtenstein (1968) P and Q are independent. The evaluation of worth is 
either a rating (e.g. Anderson & Shanteau, 1970), or a bid which can take one of several forms, 
e.g. (1) state the largest amount of money you would pay the experimenter in order to play the 
gamble. For one you do not want to play, state the smallest amount the experimenter would have 
to pay you to play; (2) assume you would have to play the gamble unless you could sell it to the 
experimenter. For attractive gambles state the least amount you would accept for the gamble, 
and for unattractive ones the amount you would pay the experimenter to take it off your hands. 

There have been two principal approaches to conceptualizing behaviour in this situation. The 
first devises tests of the SEU model, which states that the subject maximizes the subjectively 
expected utility of the gamble, S(P). U(A)—S(Q). U(B). The second, or information-processing 
approach, examines how subjects integrate the four sources of information, A, B, P, Q, into one 
evaluation. 


Representativeness of the task 


This study is intended to question the assumption that bidding for such gambles is a 
‘miniaturization’ (Coombs, 1971) of common decision situations, i.e. that the results and models 
of these experiments are readily generalizable to other decision situations. It suggests that the 
pattern of results found in these experiments may be specific to the task, and due to simple 
strategies adopted by subjects in a situation which is hard to understand and which excludes 
much of the risk of the risky situations of which it is meant to be representative. 

That the results obtained in this task are not generalizable to other tasks is suggested by 
evidence that bids correlate highly with the EV model, which asserts that subjects maximize the 
expected value A.P-B. Q of the gamble, while that model has been rejected when other 
dependent variables or decision tasks are investigated. That simplifying strategies are being 
adopted is suggested by evidence that bidding behaviour is susceptible to changes in presentation 
conditions which make calculations simpler or which make the risks involved more salient, but 
which should not lead to changes in response. 

These two lines of evidence are discussed below. Taken together, they suggest that subjects 
behave in bidding situations as if they saw their task as one of estimating EV, while they do not 
see other decision tasks in this light. 
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EV as a predictor of evaluations 


The expected value model has an uncertain status in the study of decision making, and 
researchers have not usually set out to investigate the correlation between EV and evaluation. 
Where these have been reported, the correlations have been consistently high. Edwards (1969) 
reported r= 0-94 for the bids of Las Vegas gamblers, while Slovic & Lichtenstein (1968) give 
mean correlations with EV of 0-79 (ratings, n = 88) and 0-80 (bids, n= 125), the subjects being 
undergraduates. Tversky (1967 a) derived measures of utility and subjective probability and 
predicted selling prices of 11 prisoners. The rank-order correlation between data and the additive 
solutions obtained under the SEU and EV models were 0-988 and 0-922 respectively. When the 
models were used to predict a further series of gambles the average absolute deviations in cents 
for the two models were 13-21 (SEU) and 14-26 (EV). The behaviour of three subjects was ‘in 
almost perfect agreement with the EV model’, while the SEU model was better than the EV 
model for six out of the remaining subjects, ‘although the difference was statistically significant 
for only two subjects’. 

Coombs, Bezembinder & Goode (1967) report correlations of 0-97 (buying bids), 0-99 (selling 
bids), and 0-99 (fair price bids). Subjects were 18 mature Negro adults and 22 undergraduates, 
and the correlations were not different for the two groups. In a replication of part of Slovic & 
Lichtenstein’s 1968 study, Crozier (1974) found that 100R?, the coefficients of determination, for 
12 subjects, each bidding for 60 gambles, were 50-39, 82-66, 80-39, 74-23, 80-34, 79-29, 93-93, 
77-55, 79-91, 78-31, 76-41 and 75-90. Apart from the first subject, over 74 per cent of the 
variation in bids could be accounted for by variation in the four components, A, B, P, Q. 

Correlations remain reasonably high when three-outcome gambles (three amounts to win or 
lose, three probabilities of winning and losing) are considered (Crozier, 1974). Ten subjects 
evaluated two- and three-outcome gambles. The mean correlations for the latter were 0-75 (two 
winning and one losing outcome) and 0-86 (one winning, two losing). When questioned, eight out 
of ten subjects described their behaviour in terms of ‘averaging’, reducing the outcomes by 
some amount depending on the relative size of the probabilities. They did not claim to work out 
the exact price, e.g. ‘I tried to work out the average. . .used the range of payoffs to reach a 
figure. . .the amount to win plus the amount to lose. ..some figure obtained by weighting 
numbers by their probabilities. ..not very much influenced by the absolute size of pay-offs, only 
with the range of bets’. Seven out of ten subjects thought the three-outcome gambles no more 
difficult to evaluate than the two-outcome ones, and all said that they processed the latter by 
reducing them to two-outcome ones, combining the two similar pay-offs. A further study of 
evaluations of three-outcome gambles where the pay-offs and probabilities had to be stored in 
memory before the evaluation was made resulted in a median correlation of 0-72 with EV and 
0:73 when estimated probabilities were substituted for the objective ones. Table 1 gives the 
mean correlations for the 15 subjects. 

All these results show a close relationship between EV and evaluation, and could be 
interpreted as support for EV as a descriptive model, or as a demonstration of the rationality of 
subjects. However when the dependent variable is changed to choosing between gambles, the 


Table 1. Correlations between bids and two expectation models. Each subject bid for 36 
gambles. Decimal points omitted 


Subject 1 2 3 4 5 6 7 8 9 10 il 2 B 14 15 


EV 5 0 17 68 6 8B 9 81 6 N) RB 7 9M B 64 
SEV 54 79 32 5 6 07 BZB B 6 NV 88 93 M gZ TI 
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EV model is less able to predict behaviour. In the same experiment that they reported 
correlations between bids and EV of 0-97 and 0-99 Coombs et al. (1967) reported that, when 
paired comparison data were analysed, the EV model could be rejected for 34 out of the 37 
subjects on whom it could be tested. Lindman (1971) and Lichtenstein & Slovic (1971) found 
discrepancies in orderings of gambles based on bids and paired comparisons, including reversals 
of choice. Payne (1973) reviews evidence from several experiments where subjects’ stated 
preferences between gambles are incompatible with the EV model, including clear preferences 
where there should be indifference (Payne & Braunstein, 1971), and systematic intransitivities 
(Tversky, 1969). When subjects are asked to place a bet on whether an outcome will occur, 
there is evidence that their behaviour is incompatible with the EV model, which predicts that 
subjects should always bet on the more likely outcome. Irwin & Snodgrass (1966) and Irwin & 
Graae (1968) found that subjects did not follow this principle, and that their behaviour was 
influenced by the introduction of pay-offs which made some outcomes attractive but should not 
have been relevant to their choice of bet. They considered these results as evidence of an 
interaction effect between the perceived likelihood of an event and its attractiveness, which 
again is incompatible with expectation models of decision making. Such an effect has been widely 
reported when subjects are asked to make estimates, predictions or bets, e.g. Marks (1951), 
Pruitt & Hoge (1965), Slovic (1966) and Morlock (1967). Those studies which have estimated 
probabilities from evaluations of gambles (Tversky, 1967a, b; Anderson & Shanteau, 1970) have 
failed to find evidence of this effect, and report behaviour compatible with expectation models. 
These results suggest that the EV model is a good predictor of gambling behaviour when a bid 
is the dependent variable but is less good when subjects place bets or choose between gambles. 


Determinants of performance in evaluation tasks 


While correlations between evaluation and EV are high, they seem to be affected by variables 
which are related to the ease of making calculations but which do not require a different strategy 
on the part of the subject. Herman & Bahrick (1966) found that the number of risk dimensions to 
be processed affected the correlation, and Miller & Meyer (1966) found that the way in which 
the information in the gamble was presented was important. 

The correlations are also affected by changes in the experimental conditions which draw 
attention to the reality of the amounts to be won and lost. Both testing the SEU model and 
examining subjects’ information-processing strategies require subjects to make a large number of 
evaluations of gambles, and the outcomes of these gambles are independent and not cumulative 
from trial to trial of the experiment. Any wins or losses in actual play are postponed to the end of 
the experimental session, since changes in the subject’s financial situation during the experiment 
would change the utility or value to him of the pay-offs in gambles later in the session (these 
problems are discussed more fully in Krantz & Tversky, 1965). Sometimes the pay-offs are 
hypothetical and money does not change hands, and often the subjects are paid for their attendance 
at the experiment. Lichtenstein, Slovic & Zink (1969) found that when only one gamble rather 
than a large number was played behaviour was not in keeping with the EV model. Slovic, 
Lichtenstein & Edwards (1965) reported that when a very large number of gambles were to be 
played, and when pay-offs were imaginary, subjects tended to adopt the simplest possible 
strategy and followed rules like, ‘minimize the maximum loss’ or ‘select the gamble with the 
higher probability of winning’. Slovic (1969) found differences in strategy between subjects who 
made hypothetical choices and those who played a number of gambles to determine their payment 
for participating in the experiment. The second group were more cautious and preferred better 
odds and smaller losses. 

These variations in design make the riskiness of the gambles more salient. The instructions to 
the subject may also emphasize the computational aspects of the experiment and play down the 
risky aspects. It is difficult to explain the concept of selling price to most subjects, especially the 
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shift in response required by attractive and unattractive (positive and negative EV) gambles. It is a 
response to a gamble that subjects are unfamiliar with, and (in the present writer’s experience) 
most subjects initially assume that the experimenter is asking them to bet a certain amount of 
money, and have to be dissuaded from this perception of the task. It has also to be explained to 
subjects that they should never bid more than the winning pay-off or less than the losing pay-off; 
they should choose their bid between these extremes to reflect the attractiveness of the gamble; 
they may not simply refuse to play a gamble. These standard instructions impose clear demand 
characteristics on the subject, and the subject who follows them closely will behave in keeping 
with the EV model. 


Conclusion 


It is argued here that subjects who take part in experiments where they are required to make 
evaluations of the worth of gambles are presented with a task that is unfamiliar and difficult to 
understand. This task difficulty, and the disguise of the riskiness of the gambles due to a large 
number of similar gambles presented on slides or cards, and delayed or absent feedback from 
actually playing the gambles, may result in the subject developing decision strategies specific to 
this task. He may be guided in his choice of strategy by the instructions which stress 
computations within a range specified by the experimenter. Such an interpretation would account 
for the pattern of results reported above. The detailed instructions about the kind of response 
required and the delay in playing any gambles so that no money exchanges hands, induces the 
subject to see his task in effect as that of estimating the EV of the gambles. He can do this 
successfully and will be aided by transformations of the gambles which make estimation easier. 
If only one gamble is to be bid for, the amounts to win and lose will be salient. With judgements 
of preference or placing bets the tasks are simple to explain and familiar to the subject 
(examples are ‘would you toss a coin for £1 or accept 50p for sure?; I bet you 50p that that coin 
will show heads’). Personality differences, preferences for risk, optimism and pessimism may be 
revealed in these tasks, and are notably absent from studies of bidding behaviour. 

There is no direct evidence to support these hypotheses, and further research could test them, 


and explore where expectation models hold and where they do not. 
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Graphic rating scales - How many categories? 


Stuart J. McKelvie 





Two experiments investigated the reliability and validity of a continuous and various categorized (five, 
seven, 11) rating scales on two different tasks (involving attitude judgement and psychophysical judgement). 
Although subjects appeared to prefer the continuous scale, it did not offer any advantages in terms of 
reliability or validity. It is proposed that a relatively small number of categories (five or six) should generally 
be used because (a) the five-category scale was most reliable (at least on the attitude judgement task), 

(b) confidence judgements made by subjects using the continuous scale indicated that they were operating 
essentially with five or six categories, and (c) other evidence suggests that whereas there is no psychometric 
advantage in a large number of scale categories (greater than nine to 12), there may be a loss of 
discriminative power and validity with fewer than five. 


Despite considerable research over the last 50 years, investigators have failed to agree on the 
optimal number of categories for a rating scale. Recommendations have ranged from as few as 
two categories (Peabody, 1962; Komorita, 1963) to as many as 20 (Champney & Marshall, 1939) 
or even 30 (Gulliksen, 1958). Indeed, a number of authors have concluded that there is no single 
optimal number of categories; rather, the appropriate number is a function of the type of 
stimulus being rated (Cronbach, 1946; Bendig, 1954a; Garner, 1960; Komorita & Graham, 1965; 
Masters, 1974). 

The latter argument implies that stimuli which are highly discriminable will require more 
categories than those which are difficult to discriminate; and that, for a given stimulus, accuracy 
of judgement will be lost if too few categories are employed (the scale is too coarse), whereas 
no accuracy will be gained if too many categories are used (the scale is too fine). 

Although many writers agree that the optimal number of categories is that which allows 
maximum discrimination by raters, they have not all adopted the same measure of discrimination. 
Five kinds of measure have been advocated: discriminability scaling (Garner, 1960), information 
transmitted (Bendig, 1954 b; Bendig & Hughes, 1953; Garner, 1960; Hake & Garner, 1951), 
independent duplication of judgement (Tukey, 1950; Gulliksen, 1958), the standard error of 
measurement (Gulliksen, 1958; Ramsay, 1973), and reliability (Symonds, 1924; Likert, 1932; 
Champney & Marshall, 1939; Ferguson, 1941; Remmers & Ewart, 1941; Cronbach, 1946; Bendig, 
1953, 1954.4; Guilford, 1954; Komorita, 1963; Komorita & Graham, 1965; Matell & Jacoby, 1971; 
Finn, 1972; Masters, 1974; Lissitz & Green, 1975). Each of these will be discussed in turn, 
although it is clear that the most popular measure of discrimination has been reliability. 


Direct measures of discrimination 


Garner (1960) argues that direct measures of discrimination are best. He found that, as the 
number of rating categories increased from four to 20, the standard deviation of discriminability 
scale values increased in a linear fashion, and concluded that subjects were still capable of 
discriminating with 20 categories. Garner also reported a linear increase in information 
transmitted up to 20 categories, although the rate of improvement was not great. Similarly, 
Bendig (1953) and Bendig & Hughes (1953) demonstrated a linear increase in information 
transmitted up to nine categories, although Bendig (1953) noted that the further increase to 11 
categories was much less marked. He suggests that this slower rise may represent the beginning 
of Hake & Garner’s (1951) ‘diminishing returns effect’: they reported that significantly more 
information was transmitted with ten categories than with five, but that the improvement 
thereafter (with 50 categories) was slight. Notably, Kintz, Parker & Boynton (1969) also found a 
fairly rapid rise in information transmitted up to 12 categories, with a smaller increase at 44 
categories. 
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However, these studies of information transmission received a severe blow from two papers 
by Macrae (1970a, b). He points out that the information measure used by these investigators 
produces an overestimation of the amount of information transmitted and, furthermore, that this 
overestimation becomes greater as the number of categories increases. Thus, reported increases 
in information transmission with an increasing number of categories may be spurious; indeed, 
the ‘small increases’ found with more than 12 categories probably represent a real drop in 
information transmitted. 

Taking a slightly different approach to the problem of discrimination, Tukey (1950) 
recommends that the scale should be sufficiently fine to ensure that an independent duplicated 
judgement will be exactly the same 10 per cent of the time. More than 10 per cent agreement 
implies that the scale is too coarse; less that it is too refined. However, in attempting to assess 
the 54 per cent agreement obtained by Osgood, Suci & Tannenbaum (1957) on their 
seven-category semantic differential scale, Gulliksen (1958) notes that it is extremely difficult to 
obtain independent judgements, and suggests that a criterion of 20 or 30 per cent agreement 
might be more practical. (Note that even by this criterion, Osgood et al.'s seven-category scale is 
too coarse, and should be made finer.) 

Gulliksen went on to propose that a better measure of discrimination would be the standard 
error of measurement, but it was only recently (Ramsay, 1973) that this idea was followed up. 
Ramsay examined the standard errors of measurement generated by maximum-likelihood 
estimates of scale values for different numbers of categories (over various discriminal 
dispersions) and concluded that there was little gain in precision of measurement above seven to 
nine categories. 

Taken together, these studies of direct measures of discrimination suggest that there is no 
advantage to be gained with a large number (more than 12) of categories, There is no clear 
evidence of significant increases in information transmitted above 12 categories, and Ramsay’s 
study indicates that standard errors of measurement do not decrease significantly with more than 
nine categories. 


Reliability 


The most commonly used measure of discrimination allowed is reliability. Corresponding to 
Macrae’s work on the effect of categorization on the measurement of information transmission, 
various theoretical studies have been carried out on the effect of categorization on the correlation 
coefficient (Symonds, 1924; Champney & Marshall, 1939; Peters & Van Voorhis, 1940; Lissitz 

& Green, 1975; Wylie, 1976). However, most investigations of reliability as a function of the 
number of categories are empirical and involve the estimation of internal consistency reliability 
(Likert, 1932; Remmers & Ewart, 1941; Bendig, 1953, 1954a; Komorita, 1963; Komorita & 
Graham, 1965; Matell & Jacoby, 1971; Finn, 1972; Masters, 1974) or test-retest reliability (Matell 
& Jacoby, 1971). 


Effects of categorization on the correlation coefficient 


Symonds (1924) began with the Sheppard—Kelley formula for correcting a correlation coefficient 
for coarseness of grouping and derived the number of class intervals necessary to ensure that 
‘true’ reliability (i.e. the reliability of continuous data) would not drop more than a specified 
amount (equivalent to a rise of 0-0213 in the coefficient of alienation) when the continuous data 
was categorized. Since Symonds felt that the true reliability of ratings of human traits was 
between 0-60 and 0-70, he recommended the corresponding derived number of categories (seven 
to nine) for use in this area. However, Champney & Marshall (1939) challenged the 
Sheppard—Kelley formula as a point of departure. When they calculated correlations for 
continuous and categorized data, the (uncorrected) correlations for categorized data were 
generally lower than would have been predicted by Symonds; moreover, the correlations 
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appeared to increase up to 45 categories and then to decrease with continuous data. Champney 
& Marshall concluded that, as the number of categories increases, reliability increases until the 
optimal point of discrimination is reached, after which it will fall; they also suggested that this 
optimal point will depend on the conditions of measurement, and is not obtainable by a simple 
correction of the correlation coefficient for grouping as assumed by Symonds. 

Despite Champney & Marshall’s criticisms of the Sheppard—Kelley formula as a point of 
departure, others have advocated its use for estimating the amount of underestimation of the 
correlation coefficient when data are categorized (Peters & Van Voorhis, 1940; Guilford, 1965). 
Also, Wylie (1976) demonstrated that, even when the underlying distributions are skewed, the 
correction for coarse grouping can be used with confidence between three and 15 categories (it is 
unnecessary beyond this, the degree of underestimation being slight). It is notable, however, 
that inspection of the correction factors provided by Peters & Van Voorhis (1940, p. 398) reveals 
that the amount of underestimation only becomes extremely serious (i.e. obtained value is less 
than 90 per cent of true value) when the number of categories is below six. This conclusion is 
supported by a recent Monte Carlo study of the effect of the number of scale points on 
reliability (Lissitz & Green, 1975). Lissitz & Green generated sets of continuous data according 
to the assumptions of classical test theory, recategorized them into two, three, five, seven, nine 
or 14 categories, then calculated internal consistency (Cronbach’s a) and test-retest reliability 
(Pearson product moment r) measures. By iterating this procedure 100 times, they were able to 
calculate the means and standard deviations of the reliability estimates for each type of scale. 
Although the mean reliabilities rose from two to five categories, and remained stable thereafter, 
the standard deviation of the reliabilities fell from two to five categories, and remained stable 
thereafter. Therefore, they concluded that reliabilities obtained with very small numbers of 
categories (two, three) not only underestimate true reliability but also vary considerably, and 
that there is no significant gain in reliability above five categories. 

The studies of Lissitz & Green (1975) and Wylie (1976) cast doubt on Champney & Marshall’s 
(1939) conclusion that significant increases in the correlation coefficient can be expected with 
large numbers of categories. Rather, it seems that while grouping does tend to lower the true 
correlation coefficient, the effects are only serious with less than five or six categories. 


Empirical investigations of reliability 
The most popular criterion of reliability adopted by investigators studying the empirical 
relationship between number of scale categories and reliability has been an internal consistency 
measure (e.g. Cronbach’s, 1951, coefficient a). Remmers & Ewart (1941) reported an increase in 
reliability as the number of categories rose from two to five, followed by a slight decline at 
seven categories, a result which was confirmed by Finn (1972) who also showed a further decline 
with nine categories. In contrast, Bendig (1953, 1954.4) found that reliability was essentially 
constant over the range from two to 11 categories, and both Komorita (1963) and Komorita & 
Graham (1965) reported no change in reliability from two to six categories, at least for 
homogeneous items; for heterogeneous items, the six-catégory scale was more reliable. It is 
difficult to conclude, however, that reliability is only lowered with a small number of categories 
on heterogeneous items, since Masters (1974) obtained different results with two homogeneous 
questionnaires: with one, reliability rose from two to four categories then remained constant up 
to seven; in the other there was no change in reliability from two to seven categories. Finally, in 
a more extensive study, Matell & Jacoby (1971) reported that reliability was invariant over the 
range two to 19 categories. They also obtained estimates of test-retest reliability, and found the 
same result. 

Taken together, these empirical studies appear to demonstrate that while there is no increase 
in reliability above six categories, there may be a drop below this number. However, none of 
these studies attempted to correct the measure of reliability for the effects of coarse grouping; 
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given that a and r only appear to be underestimated significantly when the number of categories 
falls below five, the empirical studies might be interpreted to mean that reliability is essentially 
invariant, over a large range of categories, since any obtained drop in reliability below six 


_ categories could be attributed to grouping error. Indeed, the case might even be made that true 


reliability is higher with less than six categories, since many of the studies (Bendig, 1953, 1954.4; 
Komorita, 1963; Komorita & Graham, 1965; Matell & Jacoby, 1971; Masters, 19741 reported no 
drop in (uncorrected) reliability in this range. Tha extreme position on this issue is occupied by 
Komorita (1963) and Peabody (1962). With bipolar Likert-type scales, they found that 


> extremeness (intensity) contributed relatively little to the total score, which was mainly a. 


function of direction; both authors concluded that a dichotomous (two-category) scale was 
adequate. 


‘s 
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Validity = 


Although the amount of discrimination allowed by a scale is important, a number of authors 


(Cronbach, 1946, 1950; Komorita & Graham, 1965; Matell & Jacoby, 1971) have pointed out that 


increases in discrimination are without value unless they are accompanied by increases in 
validity. Matell & Jacoby explored this question by obtaining empirical estimates of both 
concurrent and predictive validity for scales ranging from two to 19 categories; they found no 
systematic effect of the number of categories on either type of validity. However, Green & Rao 
(1970), in a theoretical study of factorial validity, reported that the recovery of interpoint 
distances in their original data was generally less successful if the data rested on two- or 
three-category scales than if it was based on scales of six or 18 categories. Following criticisms 
by Benson (1971), Green & Rao (1971) admitted that their results could not automatically be 
generalized to all situations (e.g. verbal step scales), but they again concluded that two- or 
three-category scales may be weak. Finally, using a more conventional factor analytic approach, 
Schutz & Rucker (1975) found that the same factors emerged from data collected on two-, three-, 
six- and seven-point scare. Thus it would appear that validity is relatively unaffected by the 
number of scale categories, although it is possible there is a loss with a very small number (two. 
or three). r: 


The present experiments A 


The above review suggests that, in terms of direct measures of discrimination, reliability, and ` 
validity, there is no apparent advantage of a large number of scale categories (i.e. greater than 
nine to 12); but with a small number of categories (less than five or six) there seems to be a loss 
of discriminative power (as indicated by direct measures of discrimination) and possibly a loss in 
validity. However, although the range of categories covered by these studies is fairly wide (two 
to 44), very few investigations have employed a continuous scale. This scale would seem to be 
of some interest, since a number of authors have proposed that subjects might find it most 
pleasing to use (Symonds, 1924; Matell & Jacoby, 1971; Ramsay, 1973). Although some of the 
studies discussed above have begun with continuous data, which has been recategorized 
(Champney & Marshall, 1939; Ramsay, 1973; Lissitz & Green, 1975), there do not appear to 
have been any attempts to estimate empirically the reliability and validity of ratings obtained on 
a continuous scale, although it has been suggested by Johnson (1972). 

The major purpose of the two experiments reported here is to compare the continuous scale 
(continuum, CTM) with various category scales (five, seven 11) with respect to reliability 
(test-retest) and validity (empirical and face). Use of the continuous scale also provides the 
opportunity to assess directly any spurious effects of categorization on the correlation 
coefficient: empirically gathered data on each of the category scales can be compared with data 
collected on the continuum and transformed into the various categories. Although the category 
data could have been corrected using the general Peters & Van Voorhis ( 1940) formule, the 
present procedure provides a more direct control for the effects of categorization on r. 


s 
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Since sonie authors (Cronbach, 1946; Guilford, 1954; Bendig, 1954a; Garner, 1960; Komorita 
& Graham, 1965; Masters, 1974) suggest that the effect of number of categories on psychometric 
properties of the scale is a function of the type of stimulus being rated, the present experiments 
involved two different tasks: a relatively heterogenous attitude ~ judgement task, in which the 
concept ‘French-Canadian’ was rated on ten bipolar scales, and a more homogeneous 
psychophysical task in which the pitch of each of ten tones was rated relative to two standard 
, tones. The empirical validities of the scales were assessed with respect to this latter task; here, 
‘correct’ responses could be established in advance, and the validity (accuracy) of each scale 
measured by correlating obtained with correct responses on that scale. 

Finally, reliabilities of four different scores were calculated: test reliability (total, individual 
scores) and rater reliability (total, individual scores). Bendig (1954 a) distinguished test reliability 
(when the test is used to differentiate raters) from rater reliability (when the test is used to 
differentiate items); in the former case, the tester typically totals responses across items, to give 
each person a score; in the latter, he totals across subjects, to give each item a score. Secondly, 
whereas empirical studies normally estimate reliability by correlating total scores for each person 
or each item, thereby obtaining one correlation coefficient for each type of scale, the present 
study also'obtained a finer measure of reliability by calculating correlations for each individual 
subject across items (for test reliability) and for each item across subjects (for rater reliability). 


Experiment I 

In Expt. I, three types of scale were investigated: seven-category (7CAT), 11-category (1ICAT), and 
continuum (CTM). A repeated measures design was adopted, subjects providing judgements on both tasks on 
all three scales. At the end of the experiment, each subject filled out a questionnaire designed to isolate 
which scale (if any) he preferred (face validity). 


Method 


Subjects. Subjects were 30 undergraduate students (20 males, 10 females) áe aT" University. They were 
drawn froma variety of disciplines and were paid $2.00 for their participation. 


Rating scales. Three types of rating scale were used: the continuum (CTM) was a continuous line 16-5 cm 
long; the seven- and 11-category scales (7CAT, 11CAT) were lines of similar lengths marked off into seven 
and 11 equal-sized categories respectively (see Fig. 1). 


Procedure. Subjects were required to make two kinds of judgement. Firstly, in the adjective task (ADJ), they 
received a booklet in which they judged the extent to which they thought each of ten adjectives (friendly, 
wealthy, artistic, boastful, sociable, egotistic, sincere, competitive, religious, excitable) was descriptive of 
French Canadians as a group, relative to English Canadians. The more to the left a subject marked the line, 
the less descriptive he thought the adjective was of French-Canadians; the more to the right, the more 
descriptive he judged the adjective to be. When the ratings of the adjectives had been made, subjects were 
administered the tone task (TONE), in which they judged the pitch of ten pure tones (380, 400, 430, 482, 

515, 545, 570; 610, 640, 670 c/sec) relative to the standard tones. Each trial consisted of the presentation of a 
low standard tone (355 c/sec) followed by a high standard tone (720 c/sec), and then one of the test tones. 
Subjects were instructed to regard the first two tones as representing the extreme left-hand side and 
right-hand side of the line respectively, and to place a mark on the line which they felt corresponded to the 
test tone. Both the stimulus and interstimulus presentation times were 1 sec, and all instructions and tones 
were presented via a tape-recorder. 

All subjects received the ADJ task first, followed by the TONE task, and, within each task, they made 
judgements on all three types of scale (7CAT, 11CAT, CTM), 1.e. the ten ADJ stimuli and ten TONE stimuli 
were judged three times by each subject. Care was taken to ensure that the stimuli were presented in 
different orders on each occasion and that the scales themselves were administered in a different order for 
the two tasks. Subject were tested individually by one of three experimenters. 

Two weeks later, subjects returned and repeated their judgements, the ADJ task again being admmistered 
first. However, within each task, the order in which they received the three scales was altered from the first 
occasion, and, for each scale, the order of presentation of the ten adjectives and the test tones was changed 
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Hard to say 





SLAB Not at all Barely Quite Highly 
ae a ee E 
(ADJ) ; 
Closer to 
neither one 
SLAB Very close , Quite close orother , Quite close _ Very close 
„e ff =) HIGH 
(TONE) LOW i nes 


Not Hard 
ILAB at all Barely Not very, to say Quite Very Highly 
(ADJ) | 
Closer to 
Extremely Very Quite neither one Quite Very Extremely 


7LAB pep ee e 
(TONE) LOW HIGH 


Figure 1. Scales investigated in Expt. I (7CAT, 1JCAT, CTM) and Expt. II (all scales). 


Finally, subjects filled out a questionnaire (see Appendix) designed to elicit their preferences among the _-- 
three scales and to obtain their evaluations of their own performance. 


wed ` 
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Coding of judgements. Responses on the continuum were measured to the nearest millimetre from the left i h, 


hand side of the line: on the 7CAT and 11CAT scales, they were simply assigned the number of the 
category into which they fell, again counting from the left. 


Scoring procedures. Both reliability and validity (accuracy) were measured by the Pearson product moment 
correlation coefficient r, Test-retest reliability was assessed by correlating the ratings made in session 1 with 
those in session 2; validity was assessed by correlating each subject’s ratings with the correct responses for 
the scale in question. The correct response was that category or point on the line into which the tone fell, 
assuming that the subjective scale for pitch was linear, a reasonable assumption over the range involved 
(Stevens & Volkmann, 1940; Stevens & Galanter, 1957). 

Test reliability (which measures the consistency of measurement of the individual subject) was calculated 
for individual scores by correlating each subject’s pairs of responses over the ten stimuli. In this fashion, a 
correlation coefficient measuring test reliability was obtained for each subject on each scale on both tasks. In 
addition, each subject's responses on CTM were transformed into seven- and 1i-category scales, and test 
reliability coefficients calculated for the transformed data. Test reliability was calculated for total scores by 
correlating the sums of each subject’s responses on each session; this procedure, which produced a single 
correlation coefficient for each scale, was only performed on the ADJ task data, since it made little sense to 
sum each subject’s responses on the ten tone judgements. Test reliability coefficients for total scores on the 
ADI task were also calculated from the transformed data. 

In order to assess rater reliability (which measures the consistency with which stimuli are rated) for 
individual scores, the pairs of responses on each stimulus were correlated over the 30 subjects. In this 
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fashion, a measure of rater reliability was obtained for each task (ADJ, TONE) on each of the three scales. 
In addition, rater reliability coefficients were calculated for the transformed data. Rater reliability coefficients 
for actual and transformed data were also obtained for total scores on both tasks. 

Using similar procedures, test validity (individual scores) and rater validity (total scores) were calculated 
for both session | and session 2, on the TONE task. It was not possible to calculate validity coefficients on 
the ADJ task, since no external criterion was available. 


Results 


For purposes of statistical analysis, all individual score reliability and validity coefficients were 
transformed into Zr scores; however, the data are presented here in their original form. 

Data for individual score test reliability, rater reliability and test validity are shown in Tables 
1, 2 and 3. Actual and transformed data were analysed separately with 3x2 (scale, task for 
reliability; scale, session for validity) analyses of variance (repeated measures on both factors, 
Winer, 1962, p. 324). No significant effects were found on test reliability, but on rater reliability, 
the effect of task was significant: on actual data, F = 30-22, d.f. = 1, 18, P< 0-01; on transformed 
data, F= 18-67, d.f. =1, 18, P< 0-01. Inspection of Table 2 reveals that this effect was due to 
lower reliability on the TONE task. Finally, on test validity the effect of scale (on transformed 
data) was significant, F = 4-47, d.f. =2, 58, P< 0-05. Newman-Keuls tests among the three 
scales indicated that 7CAT < CTM, P <0-05, all other comparisons being non-significant. Since 
the validity coefficient drops off on the 7CAT scale on transformed data but not on actual data, 
it may be inferred that validity is relatively better on the 7CAT scale. However, the absolute 


Table 1. Experiment I: Mean test reliability coefficients for individual scores 


Actual data Transformed data 

Scale ADJ TONE ADJ TONE. 
ICAT 0-884 0-880 0-878 0-854 
11CAT 0-899 0-864 0-870 0-870 
CTM 0-898 0-855 0-898 0-855 


. ” ble 2. Experiment I: Mean rater reliability coefficients for individual scores 





Actual data Transformed data > 
Scale ADJ TONE ADJ TONE 
ICAT 0-719 0-291 0-568 0-306 
1ICAT 0-684 0-396 0-624 0-275 
CTM 0-622 0-346 0-622 0 346 


Table 3. Experiment I: Mean test validity coefficients for individual scores (TONE task) 





Actual data Transformed data 

Scale Session] Session2 Session! Session 2 
ICAT 0-918 0-922 0-898 0-912 
HiCAT 0-908 0-908 0-904 0-922 
CTM 0-912 0-926 0-912 0-926 
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size of the advantage is not large; the validity coefficient on transformed data only falls from 
0-912 to 0-898 (see Table 3). i 

Data for total score test reliability, rater reliability and rater validity are presented in Tables 4, 
5 and 6. No statistical tests were performed on these data, but it can be seen that there are no 
systematic effects on scale, task or session on any of the criteria. 


Table 4. Experiment I: Test reliability coefficients for total scores (ADJ task) 


Transformed 
Scale Actual data data 
7CAT 0-705 0-697 
LICAT 0-762 0-720 
CTM 0-728 0-728 


Table 5. Experiment I: Rater reliability coefficients for total scores 


Actual data Transformed data 

Scale ADJ TONE ADJ TONE 
ICAT 0-985 0-996 0-992 0-996 
1ICAT 0-990 0-993 0-995 0-995 
CIM 0-989 0-960 0-989 0-960 


Table 6. Experiment I: Rater validity coefficients for total scores (TONE task) 


Actual data Transformed data 

Scale Session | Session 2 Session! Session 2 
ICAT 0-980 0-974 0-988 0-988 
1ICAT 0-983 0-979 0-982 0-986 
CTM 0-985 0-990 0-985 0-990 


Results on the questionnaire are shown in the appendix. An overall y? test was carried out on 
each question; if it proved to be insignificant, implying no interaction between task and 
response, a second y* was performed on the total frequencies in each response category. It 
appears that subjects generally preferred the continuum (question 1). and felt that it allowed them 
to be more consistent (question 2), and perhaps more accurate (questions 3, 4); also, subjects 
indicated that they were more unsure on the TONE task than on the ADJ task (question 6), but 
that, when they were unsure, they tended to pick the middle category more often on the ADJ 
task (question 5); finally, subjects’ judgemental strategies appeared to vary according to the task 
(question 7): on ADJ, judgements were most often formed prior to marking the scale, whereas 
on TONE, judgements were also often finalized by actually moving along the line. 


Discussion 

The major result of Expt. I is that reliability is unaffected by scale type, whereas the validity 
(accuracy) is slightly better on the seven-category scale. Thus, although subjects stated that they 
preferred the continuum, and that it permitted them to be more consistent and perhaps more 
accurate, it did not, in fact, offer any advantage in terms of reliability or validity. 
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One feature of this experiment which may have concealed differences among the three scales 
is the correlated groups design. Although it is difficult to envisage how a memory factor could be 
operative during the TONE task, it is possible that, in making subsequent adjective judgements, 
subjects could have recalled their earlier responses on individual stimuli. Thus, while the 
repeated measures design provided an opportunity to obtain subjects’ preferences after having 
been exposed to all three scales, it may have hidden any real differences among the scales, 
particularly on the ADJ task. This possibility is also consistent with the fact that the only 
significant effect of scale type occurred on the TONE task. Accordingly, Expt. II was designed to 
replicate and extend Expt. I using an independent groups design. 


Experiment H 

In addition to the three scale types (7CAT, 11CAT, CTM) studied in Expt. I a five-category scale (SCAT) 
was also employed. Furthermore, two other conditions were included: an anchored (labelled) five-category 
scale (SLAB) and an anchored seven-category scale (7LAB). 

It was not clear whether verbal anchors would be expected to influence reliability or validity. Cronbach 
(1970, p. 584) argues that anchoring reduces the ambiguity of the meaning of scale points, and two studies 
(Bendig, 1953; Bendig & Hughes, 1953) claim to have demonstrated small increases in reliability and 
information transmitted as a function of the number of verbal anchors (centre, ends, centre plus ends). 
However, close inspection of their results reveals that this is not clear; the increases reported by Bendig 
were not statistically significant, and the gains in information transmitted reported by Bendig & Hughes were 
not subjected to statistical analysis; moreover, Finn (1972), who did analyse his results statistically, failed to 
find any effect of the number of verbal anchors on reliability. There is even evidence (Johnson, 1972) that 
verbal anchoring reduces accuracy in a psychophysical (weight discrimination) task. 

A final novel feature of this experiment was that subjects in the continuum condition were instructed to 
place confidence intervals around their ratings on the line; given the mean size of these intervals it was 
possible to estimate the number of categories which subjects were effectively using. 


Method 


Subjects. Subjects consisted of a new sample of 120 McGill University students paid $2.00 for their 
participation. Twenty subjects were randomly assigned to each of the following conditions: SCAT, 5LAB, 
TCAT, 7LAB, 11CAT, and CTM. Unfortunately, some subjects failed to return for the second session, 
resulting in the following distribution of sample sizes across the conditions: SCAT-20, SLAB-19, 7CAT-19, 
TLAB-17, 11CAT-20, and CTM-20. 


Rating scales. All rating scales were 16-5 cm in length, marked off and labelled as shown in Fig. 1 


Procedure. The procedure was identical to that in Expt. I, except that subjects were only exposed to one 
kind of scale. Subjects in the CTM condition were also asked to place brackets around their judgements to 
indicate how confident they felt, narrower brackets signifying more confidence and wider brackets less 
confidence. 


Results 


As in Expt. I, statistical analyses of all individual score reliability and validity coefficients were 
performed on Z, scores, although the data are presented here in the form of correlation 
coefficients. 


Test reliability (individual scores). The mean test reliability coefficients for each task on the four 
main scales (SCAT, 7CAT, 11CAT, CTM), for both actual and transformed data, are shown in 
Table 7. Again, the actual and transformed data were analysed separately; actual data were 
analysed with a 4x2 (scale, task) analysis of variance with repeated measures on the second 
factor (Winer, 1962, p. 306), whereas a 4x2 (scale, task) analysis of variance with repeated 
measures on both factofs was carried out on the transformed data. 

The first analysis produced one significant effect, the scale xtask interaction, F = 3-28, 
d.f. = 3, 75, P<0-05. Post hoc Newman-Keuls tests revealed that, on the ADJ task, 
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Table 7. Experiment II: Mean test reliability coefficients for individual scores 


Actual data Transformed data 

Scale ADJ TONE ADJ TONE 
SCAT 0-906 0-805 0-641 0-786 
ICAT 0-816 0-876 0-768 0-807 
1ICAT 0-822 0-858 0-754 0-822 
CTM 0-791 0-820 0-791 0-820 


SCAT > 7CAT = CTM (P< 0-05), while on the TONE task, all four scales were equivalent. On 
the transformed data, both the main effects of Scale (F= 14-74, d.f. =3, 57, P< 0-01) and the 
scale xtask interaction (F = 10-46, d.f. =3, 57, P< 0-01) were significant. In this case 
Newman-Keuls tests again showed no differences among the scales on the TONE task; 
however, on ADJ, SCAT <7CAT = CTM, P<0-01). Thus, when continuous data from the ADJ 
task was categorized into five categories, reliability declined, whereas empirical reliability on the 
5CAT scale was best. These results combine to render the SCAT scale clearly superior, at least 
for the ADJ task. 

A separate 2x2x2 (anchor, scale, task) of analysis of variance (with repeated measures on the 
last factor) was carried out on the following four groups: SCAT, 7CAT, SLAB, 7LAB (see Table 
8), but no significant effect of anchoring was found. Only the scale xtask interaction was 
significant (F = 7-36, d.f. = 1, 71, P< 0-01), as would be expected from the results just presented. 


Rater reliability (individual scores). Separate analyses of variance were performed cn the actual 
and transformed rater reliability data presented in Table 9. On actual data, significant F values 
were found for scale (F = 3-25, d.f. = 3, 54, P< 0-05), task (F= 19-00, d.f. = 1, 18, P< 0-01) and 
the scale xtask interaction (F= 4-20, d.f. =3, 54, P< 0-05). Inspection of Table 9 shows that 


Table 8. Experiment IJ: Mean test reliability coefficients for anchored (labelled) category 
conditions (individual scores) 


Scale ADJ TONE 
Categories 
5CAT 0-906 0-805 
ICAT 0-816 0-876 
Anchors 
SLAB 0-814 0-788 
TLAB 0-798 0-880 


Table 9. Experiment II: Mean rater reliability coefficients for individual scores 


Actual data Transformed data 

Scale ADJ TONE ADJ. TONE 
SCAT 0:742 0-136 0-518 0-424 
ICAT 0-641 0-370 0-709 0-470 
LICAT 0-404 0-307 0-716 0-516 


CTM 0-700 0-462 0-700 0-462 
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reliability on the TONE task was lower than on the ADJ task, and Newman-Keuls tests revealed 
that, on ADJ, 11CAT < 7CAT = CTM = SCAT (11CAT < CTM, P<0-05; LICAT < 5CAT, 
P<0-01), while on TONE, there were no significant differences among the scales. On the 
transformed data, there were significant effects of Scale (F = 4-47, d.f. =3, 54, P< 0-01) and 
task (F = 4-42, d.f. =1, 18, P< 0-05). Again, the TONE task was less reliable (as would be 
expected since this analysis was carried out on transformed actual CTM data), and 
Newman-Keuls tests on the scale means indicated that SCAT < 7CAT = 11CAT = CTM 

(P< 0-05). 

Since, on the ADJ task, the reliability of transformed data drops off on SCAT, but remains 
constant on actual data, it may be inferred that rater reliability is relatively better on the SCAT 
scale than on the others; and since the reliability of the transformed data on 11CAT does not 
drop off, but does on actual data, it may be inferred that the 11CAT scale is less reliable on the 
ADJ task. 

Since the reliability of SCAT on the transformed data also drops off on the TONE task (the 
scale x task interaction was not significant), it may appear that rater reliability is also relatively 
better on SCAT for the TONE task. However Table 9 suggests that this inference is rather 
weak; the mean rater reliability coefficient is low on SCAT on TONE; since a Newman-Keuls 
test showed that SCAT < CTM, P<0-01, it seems safest to conclude the scale type has little 
effect on the rater reliability of TONE judgements, a view which is supported by further results 
presented below. 

A separate 2X2X2 (anchor, scale, task) analysis of variance was performed on the four 
conditions (SCAT, 7CAT, 5LAB, 7LAB) shown in Table 10, but no significant effects of 
anchoring were found. However, the main effect of scale (F = 7-50, d.f. =1, 18, P< 0-01) and the 
scale xtask interaction (F= 13-59, d.f. = 1, 18, P< 0-01) were significant. Post hoc t tests (Winer, 
1962, p. 300) revealed that, on the ADJ task, 5=7, whereas, on TONE, 5<7, P<0-01. The 
results for ADJ confirm those reported above, and show that, for TONE, the decrement on 
SCAT which was almost significant (P < 0-10) now becomes highly significant with the anchored 
conditions included. Since the reliability coefficient for transformed data also declined on SCAT, 
it may be concluded that scale type has no systematic effect on the rater reliability of TONE 
judgements. 


Table 10. Experiment II: Mean rater reliability coefficients for anchored (labelled) category 
conditions (individual scores) 





Scale ADJ TONE 
Categories 
SCAT 0-742 0-136 
TCAT 0-641 0-370 
Anchors 
SLAB 0-554 —0-094 
7LAB 0-614 0-418 


Test validity (individual scores) The mean test validity coefficients for each session on the 
TONE task are shown in Table 11. Separate two-way analyses of variance conducted on the 
actual and transformed data indicated main effects of scale for both analyses; for actual data, 
F=5-23, d.f. =3, 76, P<0-01; for transformed data, F= 9-26, d.f. =3, 76, P<0-01. 
Newman-Keuls tests revealed that, for both sets of data, SCAT < 7CAT = LICAT = CTM 
(P< 0-01). Thus, it appears that the effects of scale on test validity are few and slight. 
Analysis of variance of the anchor conditions (see Table 12) produced a significant effect of 
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Table 11. Experiment II: Mean test validity coefficients for individual scores (TONE task) 


Actual data Transformed data 

Scale Session 1 Session2 Session 1 Session 2 
SCAT 0-833 0-840 0-787 0-838 
ICAT 0-892 0-908 0-836 0-898 
11CAT 0-904 0-908 0-858 0-897 
CTM 0-855 0-902 0-855 0-902 


Table 12. Experiment II: Mean test validity coefficients for anchored category conditions 
(individual scores, TONE task) 


Scale Session } Session 2 
Category 
SCAT 0-833 0-840 
ICAT 0-892 0-908 
Anchor 
5LAB 0-846 0-865 
TLAB 0-894 0-902 


scale (F= 11-80, d.f. = 1, 73, P< 0-01) as would be expected from the results above, but no 
significant effects of anchor, session or any interaction. Thus validity (accuracy) on the TONE 
task does not seem to be influenced by verbal anchoring. 


Total scores. Data for total score test reliability, rater reliability, and rater validity are presented 
in Tables 13, 14, and 15. Again, no statistical tests were performed on these data, but there are 
clearly no systematic effects of scale, task or session on rater reliability and validity. However, 
there would appear to be a drop in test reliability on the 11CAT scale (ADJ task); although a 
similar result did not occur in test reliability with individual scores, it is noteable that individual 
score rater reliability was lower on the ADJ task on the 11CAT scale. 


Confidence judgements. Various correlation coefficients (see Table 16) were computed on the 
confidence judgements made by subjects in the CTM condition (three subjects omitted confidence 
judgements on some trials, leaving n = 17): test-retest reliability, correlation of judgements on 
the two tasks, and correlations of confidence with validity (accuracy) and reliability. It can be 
seen that the confidence judgements were fairly stable over time (particularly on the TONE task) 
and test (particularly on session 2); however, they were unrelated to reliability or validity. 


Table 13. Experiment II: Test reliability coefficients for total scores (ADJ task) 


Transformed > 
Scale Actual data data 
SCAT 0-714 0-746 
SLAB 0-683 0-746 
FCAT 0-742 { 0-792 
TLAB 0-791 0-792 
11CAT 0-284 0-743 


CTM 0-803 0-803 
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Table 14. Experiment IT: Rater reliability coefficients for total scores 








Actual data Transformed data 

Scale ADJ TONE ADJ TONE 
5CAT 0-969 0-981 { 0-928 { 0-975 
SLAB 0-992 0-988 0-928 0-975 
7CAT 0-942 0-985 0-978 0-980 
7LAB 0-983 0-992 0-978 0-980 
11CAT 0-970 0-996 0-974 0-980 
CTM 0:959 0-980 0-959 0-980 





Table 15. Experiment II: Rater validity coefficients for total scores (TONE task) 





Actual data Transformed data 

Scale Session 1 Session 2 Session 1 Session 2 
SCAT 0-926 0-927 0 912 pa 
SLAB 0-955 0-932 0-912 0-916 
TCAT 0-972 0-977 { 0-952 { 0-971 
7LAB 0-972 0-972 0-952 0-971 
lICAT 0-985 0-988 0-976 0-975 

CTM 0-972 0-966 0-976 0-966 ~ 


Table 16. Experiment JI: Relationship between confidence judgements and various criteria 


Correlatlion between Session 1 Session 2 
ADJ and TONE 0 476 0-673 
Confidence and test validity 0-267 0-014 
Confidence and test reliability 
ADJ —0-094 0-109 
TONE 0-145 —0-033 
ADJ TONE 
Test-retest reliability 0-479 0-730 


Note. Critical values are 0-482 (0-05) and 0-606 (0-01). 


Table 17. Experiment II: Mean size of confidence judgements in CTM conditions (mm) 








Session ADJ TONE 
1 36-0 26-6 
2 34-6 30-1 





Table 17 shows the mean size of the confidence intervals in the four CTM conditions. A 2x2 
analysis of variance (repeated measures on both factors) showed that only the main effect of 
task (F= 7-66, d.f. = 1, 16, P < 0-05) was significant, reflecting the fact that subjects felt more 
confident on the TONE task than on the ADJ task. Finally, estimates of the number of 
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categories effectively being utilized on the CTM were obtained by dividing the mean confidence 
judgements for ADJ (35-30 mm) and TONE (27-40 mm) into the length of the line (165 mm) and 
found to be five (4-66) and six (6-25) respectively. 

In summary, the major results of Expt. II are as follows: (a) on the ADJ task, for both test 
and rater reliability, the SCAT scale appears to be best. and the 11CAT scale appears to be 
weak; (b) on the TONE task, both test and rater reliability are invariant over scale type; (c) test 
validity is unaffected by scale type (TONE task); (d) neither reliability nor validity are influenced 
by the presence of verbal anchors; and (e) given freedom to employ all points on the line in 
CTM, subjects appear to be effectively utilizing about five categories on the ADJ task and six 
categories on the TONE task. 


General discussion 


Although subjects in Expt. I stated that they preferred the continuous scale, and that they felt 
that they had performed more consistently and perhaps also more accurately on it, neither 
experiment provided any evidence that the CTM was more reliable (allowed more discrimination) 
or more valid (allowed more accuracy) than the category scales. While these findings support 

the proposal (Symonds, 1924; Matell & Jacoby, 1971; Ramsay, 1973) that subjects might find the 
CTM most pleasing to use, they also agree with the general conclusion reached in the 
Introduction that there is no empirical advantage of a large number of scale categories. 

Indeed, the results of the present experiments suggest that a smaller number of categories (five 
or six) should be used. Although there were no major effects of scale type on validity (only a 
slight superiority of the 7CAT scale in Expt. I), Expt. II demonstrated that the 5 CAT scale was 
the most reliable (on the ADJ task) and that there were weaknesses in the 11CAT scale 
(individual rater and total test reliability on the ADJ task); furthermore, subjects using the 
continuous scale appeared to be operating essentially with five or six categories. 

The proposal to use five or six categories, and no fewer, is also consistent with the evidence 
reviewed earlier which suggested that direct measures of discrimination and validity were 
adversely affected when the number of scale categories falls below five; and it satisfies other 
criteria such as ease of coding or scoring (Komorita & Graham, 1965; Green & Rao, 1970; 
Matell & Jacoby, 1971; Ramsay, 1973) which call for a smaller number of categories. In addition, 
the possibility of six categories recognizes Matell & Jacoby’s (1972) advice that, if it is desirable to 
minimize the number of ‘uncertain’ responses, an even number of categories should be employed. 

The notion that the nature of the task should be considered in deciding the number of 
categories to use (Cronbach, 1946; Bendig, 1954a; Garner, 1960; Komorita & Graham, 1965; 
Masters, 1974) receives support from the present study, since a number of task differences 
emerged. For example, reliability in the ADJ task was superior on the SCAT scale, but, on the 
TONE task, it was unaffected by the scale type; and the confidence judgement data suggested 
that the CTM was operating as a SCAT scale on the ADJ task but as a 6CAT scale on the TONE 
task. Moreover, although not strictly relevant to the question of the optimal number of 
categories, it is notable that, in both experiments, individual rater reliability was lower on the 
TONE task, and that subjects’ judgemental strategies appeared to differ: on the ADJ task, 
responses represented preformed judgements, whereas on the TONE task subjects also used the 
line to arrive at their decision (Expt. I questionnaire). 

Analyses of the transformed data obtained from the continuous scale conditions generally 
support the conclusion (p. 187) that grouping lowers the correlation coefficient, particularly with 
less than six categories: although transformation into seven categories caused a decline in rin 
one instance (test validity, Expt. I), the formation of five categories from continuous data always 
gave a significantly lowered estimate (at least with individual scores). This result also argues 
against the suggestion (Matell & Jacoby, 1971) that data should be gathered on a continuous 
scale (or free-response format) and converted to two or three categories for scoring purposes. 
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While this may satisfy the subjects’ preferences, it will almost certainly result in a loss of 


discriminative power. 


Taken together, the results of the present experiments and of the studies reviewed in the 
Introduction suggest that investigators should generally be advised to use five- or six-category 
rating scales; a smaller number of categories may result in a loss of discriminative power and 
validity, and a larger number conveys no psychometric advantage. However, since there does 
not appear to be any loss of reliability or validity with a larger number of categories (including 
the continuous line), more than five or six may be recommended if other considerations (e.g. 


satisfying subject preferences) so dictate. 


Finally, although the present study is by no means exhaustive in this respect, it would appear 
that the presence of verbal anchors (labels) makes little difference to reliability or to validity. 
The former finding is consistent with the results of Bendig (1953), Bendig & Hughes (1953), and 
Finn (1972), but the latter contradicts Johnson’s (1972) evidence that verbal anchors lower 
accuracy on a psychophysical task. In order to clarify this issue, a more thorough and 
systematic study of the effects of verbal anchoring on various psychometric criteria will be 


required. 
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Appendix 


Summary and analysis of questionnaire responses 


Question 1. On which scale (if any) did you prefer making your judgements? 


No 

preference 7CAT IICAT 
ADJ 2 5 4 
TONE 3 7 8 


(2x4) x* = 1-58 (n.s.) 
For response totals, x? = 19-98, P< 0-01. 


CTM 


15 
13 
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Question 2. On which scale (if any) do you think you were most consistent? 





Don’t 
know ICAT 11CAT CTM 


ADJ 9 6 4 li 
TONE 9 5 2 12 


(2x4) x? =0-81 (n.s.) 
For response totals, y? = 11-65; P< 0-01. 


Questions 3. On which scale (if any) do you think you were most accurate? 


Don’t 

know  (7CAT  1ICAT CT™ 
ADJ 6 3 10 10 
TONE 6 7 7 8 


(2x4) x7 =2-33 (n.s.) 
For response totals x? = 3-14 (n.s.) 


Question 4. Do you think that the category scales generally forced you to be less accurate than 
you could have been? 


Yes 19 
No 10 


x = 9-53, P<0-01. 
Question 5. Did you tend to pick the middle category if unsure? 


Don’t 

know Yes No 
ADJ 2 18 8 
TONE 2 8 19 


(2X3) 47 = 8-31; P<0-05. 
Question 6. How often were you unsure? 


Very Quite Seldom 


ADJ 2 2 25 
TONE 4 15 11 


(2x3) x? = 16-03; P< 0-01. 
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Question 7. Which strategy best describes how you made your judgements on each scale? 

(i) You arrived at your judgement before you made a mark on the line, i.e. the mark you put 
on the scale merely reflected the direction and magnitude of an already arrived at judgement. 

(ii) Before considering where to put your mark, you already had a vague idea as to its 
direction and magnitude, and then moving along the scale helped to make that idea more 
accurate. 

(iti) You did not all separate your judgement from making your mark on the scale. 

(iv) None of these. 


ICAT 11CAT CTM Totals 


ADJ 
(i) 23 20 22 65 
(ii) 6 9 7 22 
(iii) 1 1 1 3 
(iv) 0 0 0 0 
TONE 
À 11 12 12 35 
(ii) 14 13 14 41 
(iii) 2 2 2 6 
(iv) 2 1 0 3 


For totals, (2x4) x? = 11-93; P<0-01. 
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Impulsivity/sociability and reinforcement in verbal operant conditioning 


B. S. Gupta and Mohanjeet Nagpal 


The present investigation was designed to study the relationship between impulsivity/sociability and modes 
of reinforcement in verbal operant conditioning. Two 2x3 randomized block designs, one each for 
impulsivity and sociability, were replicated ten times. One hundred and twenty undergraduate female 
students (60 for impulsivity and 60 for sociability) were individually subjected to Taffel’s verbal conditioning 
procedure. When the conditioning scores of high and low scorers on the impulsivity and sociability scales 
were compared, it was found that under rewarding conditions (‘good’ and ‘buzzer’ in respect of sociability 
and ‘good’ in respect of impulsivity) the high scorers’ score and under punishing conditions (‘electric shock‘) 
the low scorers’ score was the higher of the two. The study also revealed that the high scorers (on the 
impulsivity scale) conditioned more under rewarding conditions while the low scorers (both on the 
impulsivity and sociability scales) conditioned more under punishing ones. 


Gray (1970, 1971) hypothesizes that extraverts are more susceptible to reward while introverts 
are more sensitive to punishment. Thus according to Gray the increasing degree of sensitivity to 
reward, as compared to sensitivity to punishment, represents the increasing degree of 
extraversion. Extraverts are, therefore, expected to perform better under rewarding conditions 
while the opposite is expected of introverts. Gray’s assumption was partially supported in a 
recent study on verbal operant conditioning (VOC) by Gupta (1976) in which extraverts and 
introverts conditioned differently to rewarding and punishing reinforcers. Extraversion is 
probably composed of two main subfactors, impulsivity and sociability (Eysenck & Eysenck, 
1969). The authors maintain that these two subfactors are by no means independent but show a 
reasonably close positive relationship (r= 0-468; n = 300; P< 0-01). It is, therefore, desirable to 
study the conditionability of individuals varying on the impulsivity and sociability scales. 

The major objective of the present study was to examine the relationship, if any, between 
impulsivity/sociability and modes of reinforcement (reward and punishment) in VOC. Following 
Gray (1970, 1971) it was hypothesized that reward would be more effective for subjects scoring 
high on the impulsivity and sociability scales while punishment would be more effective for 
those scoring low on these scales. 


Method 
Subjects 


The subjects were selected out of a sample of 947 undergraduate female students of various women’s 
colleges. In order to determine their scores on the impulsivity and sociability scales of the extraversion 
dimension, the Eysenck Personality Inventory (Eysenck & Eysenck, 1964) was administered to all the 
students ranging between 16 and 21 years of age. On the basis of their scores on these scales 120 students, 60 
for the impulsivity and 60 for the sociability groups, were selected. The impulsivity/sociability scores of the 
selected subjects were as follows: 


Personality Level Score range 

Impulsivity High scorers 6 or more than 6 
Low scorers 3 or less than 3 

Sociability High scorers 8 or more than 8 
Low scorers 3 or less than 3 


The selection of the subjects was done on the basis of mean+1-0s.p. of the impulsivity/sociability scores 
(impulsivity: mean = 4-45, s.D. = 1-47; sociability: mean = 5-53, s.D. = 2-09). Ten subjects were randomly 
assigned to each of the three reinforcement conditions at each level of impulsivity/sociability. 
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Stimulus material 


Taffel’s (1955) sentence completion technique was used (Gupta, 1973, 1976). The stimulus material consisted 
of 100 white unlined index cards (3x5 in). A neutrally toned verb in the past tense was typed in block letters 
in the centre of each card. Five pronouns, I, WE, YOU, HE and THEY were inserted in one line in block 
letters below the verb. The sequence of pronouns on each card differed randomly. 


Experimental design 


A randomized block design with two levels of impulsivity/sociability and three treatments (reinforcement 
conditions) was used. The design involved a mixed model in which impulsivity/sociability was a random 
variable and reinforcement a fixed one. In such a design it is the mean square (variance) of the fixed variable 
which contains the extra interaction term (Dayton, 1970; Keppel, 1973). Hence, it is appropriate to evaluate 
the effect of impulsivity/sociability, a random variable, against the within-cells error variance; the effect of 
reinforcement which is fixed against the interaction variance; the interaction variance itself against the 
within-cells error variance. 


Procedure 


The subject’s task was to construct for each card a sentence containing the verb and beginning with one of 
the pronouns. The first 20 cards were used to establish the operant level (initial score). The experimenter 
(second author) simply noted the subject’s response whenever she began a sentence either with ‘I’ or ‘we’. 
For the next 60 cards reinforcement was given to the subject whenever she used ‘I’ or ‘we’ at the beginning 
of the sentence. The modes of reinforcement were: (1) good; (2) buzzer, like office bell, sounded for % sec; 
(3) an electric shock of 50 V AC at 5000 c/sec administered for 4% sec on the wrists of both the hands kept in 
a relaxed manner on a table in front of the subject. The straps of the electric shock device were tied on both 
wrists of the subject before the commencement of the experiment, irrespective of the reinforcement to be 
provided though the electric shock was administered only to one reinforcement group. This was done with a 
view to neutralizing the effect of drive state produced by stress-inducing experimental situation (Ekman, 
1958). The word ‘good’ was spoken by the experimenter in a flat unemotional tone. This was the conditioning 
stage. For the last 20 cards the experimenter did not reinforce. The frequency of the personal pronouns, 1.e. 
‘I’ and ‘we’ responses, was used as a quantitative measure of the conditioned response. The arder in which 
cards were arranged was randomized from subject to subject by shuffling them before the commencement of 
each conditioning session. 

The conditioning score for each subject was derived by subtracting the frequency of her sentences 
beginning with personal pronouns (‘I’ and ‘we’) in the initial block of 20 cards (operant score) from the 
frequency of such sentences in the final block of 20 cards (test score). Gupta & Ravinderjit (1976) in a recent 
study on kinaesthetic figural after-effect have shown that this measure (post- minus pre-test score) correlates 
well with two other measures namely the inflexion ratio (Gupta, 1973, 1976) and the residual change score 
(Weintraub & Herzog, 1973). 


Results 


The conditioning scores for the ‘good’ and ‘buzzer’ reinforcing conditions were positive and for 
the ‘electric shock’ negative. This implies that the ‘I’ and ‘we’ responses were more frequent 
following ‘good’ and ‘buzzer’ and fewer following ‘electric shock’. Krasner (1958) and Salzinger 
(1959) also regard ‘good’ as the facilitator of ‘I’ and ‘we’ responses and ‘electric shock’ as the 
suppressor of such responses. According to the details given in the experimental design the 
conditioning scores were subjected to two-way analysis of variance, the results of which are 
reported in Table 1. 

The mean conditioning scores are presented diagrammatically in Fig. 1. The figure clearly 
indicates the relationship between impulsivity/sociability and modes of reinforcement in VOC. 

The differences between the mean conditioning scores (Fig. 1) of subjects scoring high and 
low on the impulsivity and sociability scales for each of the reinforcement conditions were 
evaluated by the least significant difference (LSD) test (Keppel, 1973). It was observed that 
under ‘good’ condition the high scorers on the impulsivity and sociability scales conditioned 
more than their counterparts, the low scorers (LSD value: impulsivity = 3-20, P< 0-01; 
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Table 1. Results of analysis of variance 


Personality Source d.f. M.S. F P 

Impulsivity Impulsivity (1) 1. 3-75 1-05 n.s. 
Reinforcement (R) 2 4-12 0-07 n.s. 
IxR 2 52-65 14-74 0-01 
Within | 54 3-57 

Sociability Sociability (S) 1 4-81 1-26 ns. 
Reinforcement (R) 2 5-55 0-18 n.s. 
SxR 2 30-82 8-11 0-01 
Within 54 3-80 


Note: The results are based upon two independent analyses. 


High on impulsivity og 
Low on impulsivity | 
High on sociability g 
Low on sociability = 






Electric shock 


Mean conditioning scores 
o 


Figure 1. Verbal operant conditioning as a function of impulsivity, sociability and reinforcement. 


sociability = 2-00, P< 0-05). The results were similar for the other rewarding condition, i.e. 
‘buzzer’ but failed to reach an accepted level of significance in the case of impulsivity (LSD 
value: sociability = 2-00, P< 0-05). Under the punishing condition, i.e. ‘electric shock’, a reverse 
trend was observed: the low scorers both on the impulsivity and sociability scales achieved 
higher conditioning scores than the high scorers (LSD value: impulsivity = 3-10, P < 0-01, 
sociability = 2-30, P< 0-02). The high scorers on the impulsivity scale also conditioned more 
under the ‘good’ and ‘buzzer’ conditions than under the ‘electric shock’ condition (LSD value: 
‘good’ and ‘electric shock’ = 3-70, P< 0-01; ‘buzzer’ and ‘electric shock’ = 1:90, P< 0-05); the 
low scorers on the other hand conditioned more under the ‘electric shock’ condition than under 
‘good’ and ‘buzzer’ conditions (LSD value = 2-60 in each case, P< 0-01). Similarly the low 
scorers on the sociability scale obtained higher conditioning scores under the ‘electric shock’ 
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condition than under ‘good’ (LSD value = 2-60, P< 0-01) and ‘buzzer’ (LSD value = 3-20, 
P<0-01) conditions. All such differences for subjects scoring high on the sociability scale were 


statistically non-significant. 


Discussion 


The significant F ratios (Table 1) for the interactions of impulsivity/sociability and reinforcement 
for VOC indicate that the differences between the means of various reinforcement conditions are 
not independent of the levels of impulsivity/sociability. The results, therefore, lend support to 
Gray’s (1970, 1971) assumption that individuals differ in their susceptibility to reinforcement. It 
is evident from Fig. 1 that subjects scoring high on the impulsivity and sociability scales 
(extraverts) condition more readily in the VOC situation under rewarding (‘good’ and ‘buzzer’) 
conditions while the low scorers on these scales (introverts) condition more readily under the 
punishing (‘electric shock’) condition. Eysenck & Levey (1972) also report that introverts 
display greater conditionability than extraverts only under certain conditions and that under 
other conditions the reverse is true. Thus, the results fail to support Eysenck’s hypothesis that 
the extraverts are, in general, less conditionable than introverts. 
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The validity and reliability of self-report items for the measurement of 
lateral preference 


Stanley Coren and Clare Porac 


In a series of experiments, self-report items for the measurement of eye, foot and ear preference were 
validated against individual behavioural measures. Test-retest reliabilities were obtained over a one-year 
period for eyedness, handedness and footedness self-report items. These data indicate that a questionnaire 1s 
a valid and reliable method for the assessment of various forms of lateral preference. 


Many investigators have suggested that patterns of unilateral hand, foot, eye and ear preference 
indicate underlying physiological asymmetries in cerebral functioning. Thus, the preferred hand 
is often taken as an external indicator of the cerebral hemisphere which is dominant for speech 
(White, 1969). Relationships between various manifestations of lateral preference have also been 
shown to be psychologically important since some research links patterns of eye and hand usage 
to various forms of neurological and behavioural impairment (cf. Porac & Coren, 1976, for a 
review of this literature). 

Since lateral preference of limb or sense organ is often used as an independent variable to 
demarcate experimental groups, it is methodologically important to develop valid and reliable 
measurement techniques for such preferences. Although direct behavioural testing procedures 
exist (Harris, 1957; Kovac & Horkovic, 1970; Coren & Kaplan, 1973; Porac & Coren, 1975, 
1976) most require individual testing; hence, they are not convenient when a research problem 
demands the measurement of a large sample of individuals. This problem has led several 
investigators to develop questionnaires to assess lateral preference. The majority of these 
questionnaires are concerned with handedness (Hull, 1936; Humphrey, 1951; Annett, 1970; 
Oldfield, 1971), although some attempt to assess eye, ear and foot preference as well (Bleu, 
1946; Friedlander, 1971; Kovac, 1973). 

Recently, Raczkowski, Kalat & Nebes (1974) undertook a behavioural validation of questions 
used to assess handedness. They compared self-report responses to a questionnaire to actual 
observed performance on similar tasks. Unfortunately, validation studies, such as this, are rare. 
In their absence there is no way to assess the validity of lateral preference questions. Therefore, 
the following set of investigations attempt to compare questionnaire responses with actual 
performance measures for the three remaining dimensions of lateral preference. In addition, 
reliability is assessed over a period of approximately one year. 


Experiment I 


The first experiment was designed to test the behavioural validity of questions which could be used to assess 
eye preference, more often called sighting dominance. This is the most commonly measured form of eye 
dominance, and it is the type of eye preference which is most analogous to handedness or footedness (Coren 
& Kaplan, 1973; Porac & Coren, 1976). 


Method 


„Questionnaire items and behavioural measures. The seven questions which were used are shown in Table 1. 
All are based upon existing reports of behavioural tests for sighting dominance (Danielson, 1930; Crider, 
1944; Harris, 1957; Greenberg, 1960; Kovac & Horkovic, 1970; Porac & Coren, 1976). These questions were 
embedded in a 17-item questionnaire which also contained some handedness questions used by Raczkowsk: 
et al. (1974), The questionnaire contained two alternate wordings of question 3 (placed at different positions 
within the list) so that an index of the reliability of each individual’s responses could be obtained. Subjects 
used three response alternatives for all of the questions, right, left and both. The questionnaire responses 
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Table 1. Percentage agreement between performance tests and questionnaire items in the 
determination of the side of the dominant eye 


Percentage 
agreement 
Performance Significance 
Questions and question level 
1. If you had to look into a dark bottle to see how full it was, which 82-6 0-001 
eye would you use? 
2, Which eye do you use when peeping through a key hole? 82-0 0-001 
3. Which eye do you use when looking through a telescope? 78-0 0-001 
4, Which eye do you use when sighting down a rifle? 76-5 0-001 
5. Suppose you were bending to look under a bed, which eye 64-5 0-01 
would be closest to the floor? 
6. Most people tend to carry their head with a slight tilt. Do you 59-6 0-05 
carry your head tilted to the right or the left? 
7. If someone asks you to wink your eye, which one do you wink? 52-3 n.s. 


were compared to the results from two behavioural tests of sighting dominance. Previous work has shown 
that these tests are valid and stable behavioural indices of an entire range of sighting dominance tests (Coren 
& Kaplan, 1973; Porac & Coren, 1975). 

1 Point Test: The experimenter stands approximately 2 m in front of the subject and asks the subject to 
point at his/her nose while keeping both eyes opened. The eye with which the outstretched finger is aligned 
is noted by the experimenter. This procedure is repeated several times, and the hand used to point with is 
alternated to contro! for handedness bias. 

2. ABC Test: Here, the subject covers the face with the wide end of a truncated cardboard cone which 
must be squeezed open with both hands in order to see through the apertures at either end. With both eyes 
opened, the subject is asked to align the cone with a target held by the experimenter approximately 2 m in 
front of the subject The procedure is repeated several times. and the experimenter notes which eye is used 
in the alignment each time. 

Each performance test for the dominant eye was individually administered for four repetitions. Scoring was 
+1 for each alignment done with the right eye and —1 for each alignment performed with the left eye. The 
final dominance score was the algebraic sum of these eight test responses. Such a scoring procedure gives a 
graded estimate of both the side and the strength of the sighting dominant eye. 


Procedure. A sample of 119 undergraduate volunteers (64 males and 55 females) served as subjects. Each 
was individually tested on the behavioural measures of sighting dominance then administered the 
questionnaire. All had normal or corrected to normal vision tested at both near and far convergence 
positions using portions of the Keystone Visual Skills Test Battery and a Keystone Telebinocular. 


Results and discussion 


The composite behavioural score was used to classify each subject as either right-eyed (a 
combined score between +1 and +8), left-eyed (a combined score between —1 and ~8) or 
ambrocular (a combined score of 0). The percentage agreement for the classification of the side 
of the dominant eye was computed for the behavioural score and each question separately. Table 
1 presents the concordance rates and the associated probability values which resulted from this 
analysis. It is difficult to ascertain what concordance level is acceptable without a knowledge of 
the degree of agreement between the results from alternate forms of behavioural testing for eye 
dominance. For this reason, the percentage agreement between the results of the Point and the 
ABC tests was also computed. Since these two tests show an 82 per cent concordance, it is 
clear that questions 1-4 in Table 1 agree with the composite behavioural score approximately as 
well as the two behavioural measures agree with each other. 

The responses to the two alternate wordings of'question 3 were analysed to give some 
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estimate of reliability. Concordance between the two items was quite high, and only seven out of 
the 119 subjects responded differently to the two administrations of the question. This 
corresponds to a reliability coefficient of 0-94 (P< 0-001). 

These data indicate that a questionnaire can be used to assess the side of the sighting 
dominant eye. It is interesting to note that the pattern of concordance is similar for both right- 
and left-eyed sighters. 


Experiment I 


This experiment attempts to validate behaviourally a series of questionnaire items for the measurement of 
foot and ear preference. These dimensions of lateral preference are assessed in a manner which ıs analogous 
to the determination of preferred hand use. Thus, a series of tasks which allow an individual to use only one 
foot or one ear are specified and the selected member ts recorded. 


Method 


Questionnaire items and behavioural measures. Four questions were used to assess foot and ear preference. 
These are listed in Table 2. These questions are based upon behavioural tests which have appeared in various 
batteries for the measurement of lateral preferences (Blau, 1946; Berman, 1973; Kovac & Horkovic, 1970; 
Higenbottam, 1975). These questions were imbedded in a 21-item questionnaire on sensory-motor 
coordination. As in Expt. I, responses of right, left and both were permitted. 


Table 2. Percentage agreement between performance tests and questionnaire items in the 
determination of foot and ear preference 





Percentage 
agreement 
Performance Significance 
Questions and question level 
Foot 
1. With which foot do you kick a ball? 97-7 0-001 
2. If you went to step on a stool which foot would you put up on the stool first? 89-8 0-001 
Ear 
1. Into which ear do you put the earphone of a transistor radio? 94-4 0-001 
2. If you wanted to listen to a conversation behind a closed door, 87-2 0-001 


which ear would you put against the door to hear the voices? 


The behavioural measures used to assess foot and ear preference were derived from the same lateral 
preference batteries as the questions which were previously described. They were as follows: 

1. The subject was asked to kick a ball to the experimenter who stood approximately 2 m in front of him. 
The ball was positioned in front of the subject so that neither foot would be favoured. The foot used to 
perform this action was noted. 

2 The subject was asked to place a foot upon a chair. The foot placed upon the chair was noted. 

3. A watch was placed upon a table and a cloth cover was placed over it The subject was asked to bend 
over the table until the ticking watch could be hearf through the cloth. The ear which was turned toward the 
watch was noted. 

4, The subject was given an earphone and asked to position it in an ear as if about to listen to a transistor 
radio. The ear that the subject chose was noted. 

As in Expt. I, a +1 indicated a right-sided response and a —1 a left-sided one. The final behavioural scores 
were the algebraic sums of the two foot and the two ear preference tests. 


Procedure. A sample of 95 undergraduate volunteers (49 males and 46 females) served as subjects. Each was 
individually tested on the behavioural measures of footedness and earedness and then administered the 
questionnaire. This sequencing of events was the same as that of Expt. I. 
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Table 3. Percentage agreement for test-retest of a sample of lateral preference questions 
measured over a one-year interval 





Percentage 
agreement 
between 
questionnaire 
responses after Significance 
Questions 12 months level 
Hand preference (from Raczkowski et al. 1974) 
1. With which hand do you draw? 100 0-001 
2. With which hand do you write? 100 0-001 
3. With which hand do you hold a toothbrush? 100 0-001 
4, With which hand do you use a hammer? 100 0-001 
5. With which hand do you throw a ball to hit a target? 100 0-001 
6. With which hand do you use a bottle opener? 96 0-01 
7. With which hand do you use an eraser on paper? 96 0-01 
8. With which hand do you remove the top card when dealing? 96 0-01 
Eye preference 
1. If you had to look into a dark bottle to see how full it was, 83 0-01 
which eye would you use? 
2. Which eye do you use when peeping through a key hole? 76 0-05 
3. Which eye do you use when looking through a telescope? 73 0-05 
4. Which eye do you use when sighting down a rifle? 76 0-05 
Foot preference 
1. Which foot would you use to kick a ball? 96 0-01 


Results and discussion 


On the basis of the behavioural tests, each individual was classified as preferring either the right 
or left foot and the right or left ear. Combined scores of +1 and +2 indicated a right-sided 
preference while scores of —1 and —2 showed a preference for the left side (0 indicated no 
preference). This behavioural score was compared to the classification obtained from each of the 
self-report items. The individual concordance rates and associated significance levels for these 
comparisons are presented in Table 2. 

One can obatain a composite self-report footedness score by computing an algebraic sum 
where response favouring the left side is scored —1, each response favouring the right side is 
scored a +1, and each response of both is scored as 0. If this is compared to a composite 
behavioural score which has been derived in the same manner, one finds that the correlation 
between the two totals is r= 0-81 (p < 0-001). The correlation between the composite behavioural 
and questionnaire ear preference scores is also highly significant (r = 0-79, P< 0-001). Thus, the 
agreement between self-report and behavioural indices seems quite good for these lateral 
preference measures. 


Experiment II 


Given the fact that Expts I and I] show that lateral preference questionnaire items can be designed which 
correlate well with performance measures, it is important to ascertain how stable the response patterns are 
over extended periods of time. This experiment attempts to ascertain the reliability of responses over the 
span of a year. ' 


Method 


Twenty-seven individuals who participated in Expt. I were located after an interval of approaimately 12 
months. The lateral preference questionnaire from Expt. I was re-admunistered at this time with the deletion 
of the three least valid eyedness questions 5 
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Results and discussion 


Based upon the initial test performance, each individual was classified as right or left (handed, 
footed or eyed) for each questionnaire item, and a similar classification was done upon the one 
year retesting. The percentage of agreement between the subjective reports over this interval is 


shown in Table 3 with their significance levels. 


It is interesting to note that all of the handedness items, which were taken from the 
Raczkowski et al. (1974) battery, show high reliabilities. Five of the items show 100 per cent 
concordance despite the passage of one year. The other three are approximately 96 per cent. 
These test-retest reliabilites are consistent with the figures reported by these investigators using 
a between-test interval of one month, hence the present findings provide evidence for the 
long-term stability of handedness questionnaire responses. Similar consistency is found for the 


foot preference item. 


After the one year interval, the eyedness questions show response concordance rates which 
average around 76 per cent. Although these are statistically significant they do seem to be 
somewhat lower than those obtained from the handedness and footedness measures. This 
suggests that eye dominance may be somewhat more malleable than other manifestations of 
lateral preference. However, the overall pattern of results is quite heartening. Questionnaire 
items have been obtained which are reasonably predictive of lateral preference behaviours and 
there seems to be good stability over an extended inter-test interval. 
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A note on the development of recall of spatial location 


J. M. von Wright, P. Loikkanen and P. Reijonen 





Recall of spatial location was studied with 5, 8, 12-13, and 17-18 year old subjects. Pictures of objects were 
shown one at a time in one of the four quadrants of a projection screen which was either blank (NF) or 
divided by a cross into four quadrants (F). The presence of the frame (F) did not affect item recall, but 
facilitated location recall more, the younger the subjects. Intentional learning of location was superior to 
incidental learning in the youngest but not in older children. The results are in agreement with Bryant's 
(1974) analysis of perceptual development which emphasizes the use of external spatial frames of reference 
in the encoding of attributes of visual objects in young children. 





We often remember where we saw things, or the place of a page where something was written, 
though we did not pay particular attention to spatial attributes. Correspondingly, spatial 
information relating to or associated with an event can serve as an effective recall cue for that 
event. A number of recent studies have been concerned with conditions determining the 
encoding and retention of location. The results indicate that the organization of the setting in 
which objects occur has a major effect on memory for the position of the objects. For instance, 
retention of object location is substantially better when pictures of ‘real-world’ scenes than 
when ‘unorganized’ scenes are used as stimuli (Mandler & Johnson, 1976; Mandler & Parker, 
1976). A second result is that recall of location is usually closely correlated with recall of content 
(e.g., Gamble & Wilson, 1916; Cumming & Coltheart, 1969; Rothkopf, 1971; Zechmeister, 
McKillip, Pasko & Bespalec, 1975). Third, location often appears to be encoded ‘automatically ’. 
Thus Schulman (1973) found that instructions to attend to the location of items in an array did 
not result in better location recall (or recognition) than incidental learning of locations. The three 
above-mentioned results were also obtained in a study by von Wright, Gebhard & Karttunen 
(1975), in which the subjects were 5, 8, and 12 year old children, and adults; relevant age-related 
differences were found only in the overall level of recall. However, in all of these experiments, 
the items were presented within a structured visual framework permitting an easy definition of 
the location of the items in terms of their spatial relationships. 

In his analysis of perceptual development, Bryant (1974) suggests that young children 
primarily use relative codes and that absolute codes develop only slowly with age. Thus the 
young child’s perception and memory of attributes of subjects - their size, orientation, 
position, etc. — are heavily influenced by the relations of the objects to their surrounding frames 
of reference. For instance, Bryant found that while 5 year old children quite efficiently utilized 
the positional cue ‘in-line/out-of-line’, their ability to encode position in terms of left-and-right 
or up-and-down was rather limited. It would thus be expected that the absence of an external 
framework which permits a simple ‘definition’ of the position of an object in relation to other 
objects or conspicuous features of the external framework should make location recall 
relatively more difficult in young than in older children. 

This prediction was tested in an extension of the experiment reported by von Wright et al. 
(1975). In the former experiment, pictures of objects were presented in four-object arrays, 
location being defined in terms of position within the array. Recall of location was found to 
increase with age (from CA 5 onwards) in much the same way as recall of item information, 
independently of whether learning of location was intentional or incidental. In the present study, 
the objects were presented singly in one or another of the four quadrants of a projection screen. 
In one condition (F), there was a frame dividing the screen in four equal parts, in the other 
condition (NF) the screen was blank. It was expected that while the presence of the frame 
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should not affect item recall, it should improve location recall more, the younger the children. In 
other words, impoverishment of the spatial frame of reference should have a greater detrimental 
effect on location recall for young than for older children. 


Method 
Subjects 


Subjects were 5 year old children (age range 5.0-5: 11) from kindergartens, 8 year old children from second 
grade in primary schools, 12-13 year old children from the second grade in secondary schools, and 17-18 
year old adults from the seventh grade in secondary schools, n= 144 in each age group. The subjects were 
chosen without regard to sex and on the basis of availability. The subjects in each age group were randomly 
divided into four groups of 36 subjects each. In addition, 24 5 year old children served as subjects in a control 
(free recall) experiment. They were randomly divided into two groups of 12 subjects each. 


Task and procedure 


The stimulus material consisted of 32 slides. One quadrant of each slide was white with a coloured drawing 
of a common object placed in the middle, the rest (3/4) of the slide being black. Eight objects were shown in 
each quadrant. 

The experiment was performed individually with the 5 year old children, otherwise with grcups of 6-10 
subjects at a time. The slides were shown successively for 3 sec each (change-over time included) on a 
Kodak Carousel projector in a randomized order. Thirty sec after completion of presentation location recall 
was tested, In this test the 32 objects, each drawn in the middle of an (otherwise blank) slide, were shown 
one by one in a random order. The 5 year old subject was given a white card (representing the screen) with 
the four quadrants outlined in black. He was told that he would be shown the objects he had been 
memorizing, one by one, and that his task was to point to the quadrant in which the object had been 
originally located. The experimenter recorded his responses. (The procedure was identical to that used by 
von Wright et al. 1975.) The older subjects were given an answer booklet with 32 pages. In the middle of 
each page was a square, divided into four quadrants. The subject was told that he would be shown the 
pictures again, one by one, and that he should put a mark in the square to show the position ‘quadrant) of 
the picture on its original presentation. Each response was made on a separate page. 

Two stimulus presentation conditions were used. In condition F (frame), a black cross was fastened to the 
projection screen dividing it into four quadrants, In condition NF (no-frame) the projection screen was 
blank. 

Two learning instructions were used. In condition Int. (intentional learning of location), the recall 
procedure was fully described prior to learning; the subject thus knew that and how location recall would be 
asked for. In condition Inc. (incidental learning of location), the subject was only asked to memorize the 
objects. With two presentation conditions and two instructions there were four experimental conditions. 
Thirty-six subjects in each age group participated in each condition. 

In addition, two groups of 5 year old subjects performed the task under free recall (FR) conditions. The 
materials and the procedure were identical to those of condition Inc. in the main experiment except that the 
subjects, after presentation of the pictures, were asked to recall orally the names of the objects. Stimulus 
presentation condition F was used with one group, condition NF with the other. 


Results 


The means of the location recall scores are shown in Table 1. Since there were four response 
alternatives, ‘random’ responding would be expected to yield on the average 25 per cent, i.e. 
eight items, correct. Performance was substantially better than this in all groups but one, the 5 
year old children making only 31 per cent correct responses in condition NF-Inc. While this 
percentage is significantly greater than that expected on ‘chance’ (t= 4-10, d.f. = 35, P< 0-001), 
it is nevertheless rather low. 

A three-way analysis of variance of the recall scores yielded highly significant main effects of 
age and of frame, the main effect of instruction (Int. vs. Inc.) being negligible (F < 1). Recall 
increased regularly with increasing age (F = 73-71, d.f. =3, 560, P< 0-001), improvement being 
greatest in condition NF-Inc. (108 per cent) and least in condition F—Int. (17 per cent). Recall 
was consistently better in condition F than condition NF (F= 28-79, d.f. = 1, 560, P< 0-001). 
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Table 1. Means and standard deviations of location recall scores 








Frame No-frame 
— F NF Int Inc 
Int. Inc. Int. Inc. mean mean mean mean 
CA5 M 16-31 14-25 13-94 10-03 15-3 12:0 15-1 12-1 
` S.D. 4-54 4-94 5-04 2:97 — — — — 
CA8 M 16-81 19-36 15-81 13-78 18-1 14-8 16-3 16-6 
S.D 4-20 3-51 3-52 3-41 — — — — 
CA 12-13 M 18-11 19-50 17-39 17-19 18-8 17:3 17-7 18-3 
$.D 4-38 4-51 3-59 4-73 —— — — — 
CA 17-18 M 19 08 23-00 21-69 20-86 21-0 21-3 20-4 21-9 


S.D. 5:27 4-53 4-83 5-40 — — — — 





All first-order interactions are significant. The interaction instruction xframe (F = 19-02, 

d.f. = 1, 560, P< 0-001) shows that the presence of the frame did not, on the average, affect 
location recall when learning was intentional, but facilitated location recall substantially when 
learning was incidental. The interaction agexframe (F= 5-33, d.f. =3, 560, P< 0-01) shows that 
the presence of the frame improved performance more the younger the subjects. In the two 
youngest age groups significantly more correct responses were obtained in condition F than in 
condition NF (P< 0-01, Tukey’s HSD-test, error mean square = 19-31), whereas the average 
difference was negligible in the adult group. The interaction age xinstruction (F= 7-23, d.f. =3, 
560, P< 0-001) shows that intentional learning of location was more beneficial to performance, 
the younger the subjects. In the youngest age group recall was better after intentional than after 
incidental learning (P< 0-01), whereas there was a slight tendency in the opposite direction in the 
oldest age group. The second-order interaction is not significant (F = 1-37). 

Since the presence of the frame was found to improve location recall in the youngest subjects, 
and since location recall usually has been found to be closely correlated with item recall, a 
control experiment was performed in which free recall of object names was tested in conditions 
F and NF. Subjects were 5 year old children. The presence/absence of the frame during stimulus 
presentation did not affect item recall: the mean recall score was 8-73 (S.D. = 3-14) in condition F 
and 9-33 (s.p. = 3-56) in condition NF. 

Discussion 

The results show that the presence of the frame during stimulus presentation improved location 
recall more, the younger the children. It did not, however, improve item recall. The results are 
in agreement with Bryant’s theory which emphasizes the importance of relative codes in the 
encoding of attributes of visual objects in young children. 

A description of the recall test prior to presentation (condition Int.) has often been found to 
improve recall performance in 7-8 year old or older children, but to have rather limited or 
‘diffuse’ effects on the performance of younger children (e.g. Brown, 1975; Wellman, Ritter & 
Flavell, 1975; von Wright, 1977). In the present study, the effects of intentional learning of 
location appear to be rather complex. However, the most clear-cut effect is that intentional 
learning facilitated recall of location consistently in the youngest children, especially in the 
absence of the frame. One possible interpretation of this result is that information about the 
recall test served to direct the child’s attention to the rather inconspicuous frame provided by 
the projection screen itself and thus to improve his use of it as a frame of reference. 

In summary, the results suggest that the location of objects tends to be ‘automatically’ 
encoded by older children and adults even when the spatial framework is greatly impoverished: 
intentional learning of location facilitates location recall little if at all and may in some 
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conditions even interfere with efficient encoding of location (see also Schulman, 1973). In young 
children, on the other hand, encoding of location is more closely dependent on the specific 
nature of the external frame of reference and appears to be facilitated by instructions directing 
the child’s attention to the available external framework. 
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Role of symmetry in pattern reproduction 


J. B. Deregowski 


The role of enantiomorphs (mirror images) as units of coding of symmetrical patterns was investigated. 
Simple symmetrical matrices were presented tachistoscopically for 0-3 sec and were immediately followed by 
0-3 sec presentation of the same matrix but with half of its cells occluded. The subjects were required to 
reproduce the first matrix, Five patterns of occlusion were used: random, lateral, skew, vertical and 
horizontal. It was found that lateral occlusion (which leads to a display of an enantiomorph) did not differ in 
its effectiveness from horizontal and vertical occlusions, although it conveyed twice as much information. 
Lateral, horizontal and vertical occlusions all differed significantly from the other two types which gave 
inferior results. In another experiment the effects of random and symmetrical occlusions were investigated; 
it was found that random occlusion was more effective than symmetrical occlusion. The results of the 
experiments, it is argued, show that information theory and gestalt approaches to the problem of symmetry 
are to some extent complementary. 


Although there appears to be no established consensus about the cause of the relative ease with 
which stimuli symmetrical about an observer’s median plane (hereinafter called symmetrical) are 
recalled, in contrast to asymmetrical stimuli, the phenomenon itself is well established. It is 
supported, inter alia, by the experimental evidence of Attneave (1955), who investigated 
perception of symmetry by adults using simple matrices, by Fitts et al. (1956), who extended this 
work by using stimuli similar to bar diagrams, and by Paraskevopoulos (1968), who applied 
Attneave’s technique to young children. Furthermore, data obtained from relatively 
unsophisticated subjects (Shapiro, 1960; Deregowski, 1972) suggest that symmetry is one of the 
primary factors such subjects take into account when required to reproduce simple patterns of 
the type used in Block tests. It has also been argued (Deregowski & Ellis, 1974) that symmetry 
of stimuli and of arrangements of stimuli is an important determinant of performance on 
discrimination learning tasks. 

It would therefore appear that symmetry is a salient characteristic of a pattern. Such saliency 
might be described in either such gestalt terms as figural balance or figural goodness (Katz, 
1951), or in terms of informational redundancy which symmetrical stimuli possess. 

Experimental evidence obtained by Handel & Garner (1966) and Bell & Handel (1976) suggests 
that these two descriptions are not incompatible. Their work shows that dot patterns which were 
perceived as being relatively ‘good’ could also be said to belong to a smaller group of alternative 
patterns as defined by Prokhovnik’s (1959) criteria. Relevant evidence was also presented by 
Bear (1973). One of the tasks used by him required that subjects place an additional dot in a cell 
of a 3x3 matrix, four of the cells being already filled. The additional dot was to be placed in the 
position implied by the dots already present. The results obtained showed, ‘that the elements of 
the better dot patterns were more predictable than the elements of the poorer dot patterns, in 
that a figure’s subpatterns suggested the remaining dot of the figure more strongly in the case of 
the better figures’. The patterns’ ‘goodness’ was assessed by rating the patterns on a seven-point 
scale (Handel & Garner, 1966). It is noteworthy that such rating discriminates clearly between 
symmetrical and asymmetrical patterns, the former being consistently judged to be superior to 
the latter. 

The phenomenon described by Bear and evidenced by his data can also be described as a 
tendency on the part of his subjects to complete the patterns they were given in such a manner 
as to obtain a symmetrical pattern. Such a tendency is not unique to the task in question and has 
also been observed under entirely different circumstances such as copying of simple geometric 
patterns by drawing (Shapiro, 1960) or by building replicas (Deregowski, 1974). 


218 J. B. Deregowski 


There is however a difference between the extents to which symmetrical and asymmetrical 
elements evoke symmetrical completion of the patterns. Thus when symmetry about subjects’ 
median plane is considered, the mean predictability scores for symmetrical and asymmetrical 
subpatterns which lead to symmetrical five-dot patterns in Bear’s study are as follows: (a) for 
eight symmetrical subpatterns 0-96, (b) for six asymmetrical subpatterns 0-38. The largest 
possible score is, in both cases, 1-00. 

Although this observation cannot be regarded as definitive because of certain constraints 
which Bear’s experimental procedure imposed upon the responses which he obtained (it is 
possible that his attempt to convey to his subjects what is meant by perceptual implication 
inadvertently conveyed to them that a continuity of line is an important factor and thus led them 
to tend to respond by completing rows and columns in a matrix) it nevertheless suggests that 
subjects tend to render their stimuli symmetrical, that is to say, in Attneave’s (1954) terms, 
elements which they add do not increase the information already presented in the stimulus but 
merely confirm it by rendering it more redundant. The extent to which such an effect is present 
appears to be influenced by the redundancy of the subelement, the more redundant, 
symmetrical, elements fostering further redundancy with greater vigour than their less redundant 
counterparts. 

Since a simple symmetrical pattern consists of two enantiomorphs (symmetrically arranged 
parts), such a pattern can be readily coded in terms of one of them. A reconstruction of such a 
pattern calls for no further information than description of one of the enantiomorphs and the ` 
statement of the position of the axis of symmetry. Indeed Attneave’s description of symmetry 
makes such an assumption, and the assumption itself seems entirely acceptable at higher levels 
of cognitive functioning. One can scarcely doubt that given explicit oral instructions, ‘Here is an 
enantiomorph of a figure symmetrical about the vertical axis - construct the figure’, a subject 
would respond correctly. This, however, may not hold at a perceptual level, as shown by Bear’s 
study, which suggests that asymmetrical enantiomorphs may be less likely to foster correct 
reproduction of symmetrical stimuli than symmetrical elements of such stimuli. 

Dinnerstein (1965) has pointed out the similarity between the effect of previous and concurrent 
experience upon perception and suggested that ‘certain phenomena usually conceived of as 
effects of past experience on perception are subject to the same principles which govern 
perception itself’. Accordingly one would expect the salience of the symmetry of a pattern and 
the tendency to construct symmetrical patterns to be evident in a task involving a sequential 
element, just as it was present in Bear’s experiments described above wherein the tasks used 
were of the contemporaneous kind. 

Experiments reported below were intended to investigate the effect of presentation of various 
elements of a previously presented stimulus upon reproduction of that stimulus. Two hypotheses 
were put forward, a minor hypothesis: 

Hypothesis 1. That symmetrical stimuli exposed for short intervals*will be reproducéd 
correctly more often than asymmetrical stimuli, which, if rejected, would vitiate the major 
hypothesis, which was: 

Hypothesis 2. That symmetrical stimuli exposed for a short time and followed by an exposure 
of one of their enantiomorphic elements, will be reproduced correctly more often than such 
stimuli similarly presented and followed by an exposure of non-enantiomorphic elements. 


Experiment I 
Method 


Subjects. The subjects were 48 women drawn from a volunteer subject panel in a Scottish town; they were 
paid for participation. 
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Apparatus. Twenty 4x4 matrices were prepared by allocating to each cell of the left-hand half of 
each matrix a black X or by leaving it blank, the two outcomes being equiprobable. These 
‘familial * matrices were then used in devising all of the two forms of stimuli used. For 
symmetrical stimuli the pattern arrived at was reproduced symmetrically in the right-hand half. 
Random patterns were obtained by filling the cells of the right-hand half of the matrix using the 
same method as used for the left-hand half. 

Five types of occlusion were used. In all of them eight cells of the matrix were occluded by 
green squares, each square just filling a cell. Their types were: 

1. Random occlusion: the eight cells to be occluded were chosen at random. 

2. Lateral occlusion: either the left- or right-hand half of the matrix was occluded. 

3. Skew occlusion: two diagonally opposite quadrants of the matrix were occluded. 

4. Vertical occlusion: either the central two columns of cells or the extreme two columns of 

cells were occluded. 
5. Horizontal occlusion: either two top or two bottom rows of the matrix were occluded. 
All these types are shown i in Fig. 1 (b, c, d, e, f). 
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“Yare 1. (a) Typical symmetrical stimulus; (b)-(f) Effect of various types of occlusion on stimulus ‘(a)’: (b) 
random occlusion; (c) lateral occlusion; (d) skew occlusion; (e) vertical occlusion; (f) horizontal occlusion. 


Each type of occlusion was used with four randomly chosen stimuli of the symmetrical series 
and four such stimuli of the random series. When, as in the case of lateral occlusion, two 
varieties of mask were possible (LH and RH) each of these was used twice. 

The stimuli were presented in an Electronic Developments Three Field Tachistoscope. The 
stimuli, which measured 6 cm by 6 cm appeared to the subject at a distance of about 50 cm. The 
tachistoscope was set with a plain white field showing. This was followed by a 0-3 sec display of 
the unoccluded stimulus, which was immediately followed by a 0-3 sec exposure of the same 
stimulus partly obscured by a mask. 


Procedure. The subjects were tested individually. They were given a booklet of blank 4x4 matrices of the 
same size as the stimulus matrices and told that they would be shown two patterns one after the other. The 
first would look like a sample card (which was shown to the subject in the tachistoscope). The second 
pattern would be similar but some of the squares would be green. The subjects were particularly enjoined to 
pay attention to the first pattern which they were told, they would have to reproduce in the booklet. They 
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were asked to say when they had finished copying the pattern, so that the experimenter could get a new 
pattern ready. Each reproduction was made on a separate page oi the booklet. 

Each subject was allocated at random either to respond to symmetrical or to asymmetrical series of 
patterns. 


Results 


From the total sample of subjects, three had to be rejected because of procedural errors. Of the 
remaining 45 subjects, 21 responded to random patterns and 24 to symmetrical patterns. The 
reproductions obtained from these subjects were judged to be correct when they corresponded 
exactly to the original stimuli. The frequency of correct responses within each stimulus category 
was noted. Table 1 summarizes the data obtained. It is apparent from the table (first and last 
columns) that the frequency of correct responses made under each condition is lower in the case 
of random patterns than in the case of symmetrical patterns. The first hypothesis is therefore 


' Table 1. Mean number of correct reproductions per subject 
Type of occlusion 


Unrelated 
Random Skew Laczeral Vertical Horizontal symmetrical 


“ad 


Symmetrical patterns: 


(i) Original data 0-96 1:13 275 > 308 2-42 — 
(n= 24) 
(ii) Replication data 0-90 0-80 2-50 3-20 2-10 = 
(n= 10) 7 
(ii) Additional group I — 1-40 2:70 2-50 2-70 1-10 
(n= 10) 
(iv) Additional group IT — 1-70 2:70 2-%) 2-40 1:20 
(n= 10) 
Asymmetrical patterns (n = 21) 0-05 0-10 0-29 0-19 0-10 c. 


supported by the data. In order to evaluate the second-hypothesis, that the stimuli which are 
followed by their enantiomorphic elements will be repraduced correctly more often than those 
followed by non-enantiomorphic elements, the frequencies of correct responses obtained from 
each subject to symmetrical stimuli within each type of occlusion were added together and these 
totals used in an analysis of variance. The overall analysis showed a significant effect due to the 
type of masking (F= 27-5, d.f. = 4, 92, P< 0-01). Comparison of treatment means using 
Duncan’s New Multiple Range test indicated that the effects of random and skew occlusion do 
not differ from each other. Similarly no evidence of difference was obtained between the other 
three types of occlusion: lateral, horizontal and vertical; there was. however, a significant 
(P< 0-01) difference between types of occlusion contained within these two groups, as shown 
below: 

Rand. (0-96) Skew (1-13) Hor. (2-42) Lat. (2:75: Vert. (3-08) 
The figures in brackets show the mean number of correct responses obtained yh each type of 
occlusion. 

The results are not clear cut. The frequencies of correct responses made to horizontally and to 
vertically occluded stimuli do not differ from the frequency of such responses to the stimuli with 
lateral occlusion. Exposure of an enantiomorph, it appears, does not lead to correct reproduction 
significantly more often than exposure of symmetrical elements. There is also a significant 
difference between responses to stimuli with lateral occlusion and to stimuli with skew 
occlusion. These types were intended to embody the same characteristic: to present the entire 


r 


Role of symmetry in pattern reproduction 221 


data required by a subject for reconstruction of a symmetrical stimulus, and were therefore 
thought likely to yield scores higher than those obtained with stimuli having random, vertical and 
horizontal occlusion. Since this was not the case, it was decided to replicate the experiment 
partially. The replication was confined to symmetrical matrices only and the procedure used was 
identical with that employed before. Ten additional subjects were drawn from the same 
population. A small, but important change was made in the materials employed: the stimuli which 
in the original experiment were subject to skew occlusion were used here with lateral occlusion, 
whilst those used previously with lateral occlusion were now subject to skew occlusion. This was 
done to ensure that the results obtained were not a result of interaction between the 
pattern/occlusion combinations used. The same mode of analysis as before was used. Again, 
analysis of variance shows that the results were significant (F= 15-99, d.f. = 4, 36, P<0-01) and 
form a similar pattern to that obtained before. The results are shown below: 
Skew (0-80) Rand. (0-90) Hor. (2-10) Lat. (2-50) Vert. (3-20) 

As previously, the only significant (P< 0-05) difference is that between the two groups of 

stimuli shown above. 


Discussion 


The experiment and its partial replication are mutually confirmatory. Both present the same 
results of the effect of various types of occlusion upon reproduction of symmetrical stimuli. 

Thus whilst the results are in agreement with the first hypothesis they are contrary to the 
second hypothesis. The nature of this contrariness is twofold: (a) skew-occluded stimuli differ 
significantly from laterally occluded stimuli, in spite of carrying the same information load; (b) 
laterally occluded stimuli do not differ significantly from either vertically or horizontally occluded 
stimuli. These facts probably merit further consideration. 

It was thought that the effect of occlusion would be directly related to the information loss it 
causes. Thus occlusion of an enantiomorph of a figure leads to no loss of information; on the 
other hand vertical masking leads to a loss of 50 per cent. The former, therefore, was expected to 
yield more correct reproductions than the latter. Application of this rationale to the entire series 
of occlusions results in the following arrangement with respect to items of increasing 
frequency of expected correct responses: 

Vert. Hor. Rand. Skew Lat. 
the proportion of information available with each type of occlusion being as given in Table 2. 


Table 2. Effect of various types of occlusion used upon symmetrical stimuli 


Proportion of 
Appearance of Appearance of the information present in 

Type of occlusion the mask occluded stimuli occluded stimuli (%) 
Lateral Asymmetrical Asymmetrical "100 

Random Asymmetrical Asymmetrical 50-100 

Vertical Symmetrical Symmetrical 50 

Horizontal Symmetrical Symmetrical 50 

Skew oe kew -symmetrical Asymmetrical 100 


The results obtained do not conform to the arrangement stated above. One has, therefore, to 
consider other explanations of this behaviour than those provided by simple information coding. 
It is plausible to argue, a posteriori, that the results obtained are to a large extent determined 
by the symmetry, or lack thereof, of the occluded stimuli. Table 2 shows that had this factor 
been considered the following sequence of results would have been postulated: 
Lat. Rand. Skew Vert. Hor. 
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This is notably closer to the empirically obtained results, the only contrary placing being that of 
responses to stimuli with lateral occlusion which have been observed not to differ from those 
made to the stimuli with vertical and horizontal occlusion, an observation, incidentally, discordant 
with Attneave’s (1955) finding that the rate of transmissior. of information cannot be increased 
by rendering stimuli symmetrical. Under the present conditions, at least, it appears that a 
symmetrical stimulus containing half of the information contained in an enantiomorph is just as 
efficacious as an enantiomorph. 

It could perhaps be argued that the symmetrically occluded stimuli propagate the symmetry 
‘set’ which lateral occlusion tends to diminish, and that this set is so influential that its absence 
has to be compensated for by providing additional information (which lateral occlusion does 
indeed do). Adopting an extreme position one could postulate that reproduction would be 
facilitated by having an entirely unrelated but symmetrical stimulus follow the original stimulus in 
lieu of an occluded and related stimulus. This hypothesis was tested by running another group of 
subjects (n = 10) drawn from the same population under conditions similar to those used above 
but with the randomly occluded set replaced by a set of stimuli which were followed by 
unrelated symmetrical stimuli. The results obtained from these subjects are given in Table | 
(third row). It could also be argued that had the subjects been more aware of symmetry they 
would have responded to the skewly occluded stimuli in the same manner as they responded to the 
laterally occluded. This argument was tested by introducing yet another modification. At the 
beginning of the experiment each of ten subjects was shown two cards similar to those used in 
the main part of the experiment, one bearing a symmetrical and the other an asymmetrical 
pattern. The subject was then given a shuffled pile of five symmetrical and five asymmetrical 
cards and was asked to sort them according to these criteria. Following this the subjects were 
tested on pattern reproduction using the procedure described immediately above. The results 
obtained are summarized in Table 1 (fourth row). 

Analyses of variance of the data from the two subsidiary experiments yield no indication of 
difference in responses due to the preliminary sorting task, nor has the introduction of the 
unrelated symmetrical task altered the pattern of results. This suggested that a general symmetry 
set is in itself not powerful enough to lead to correct rep-oduction; to do so the set must be 
related to the occluded stimulus. 

The results obtained in the main experiment can perhaps therefore be viewed as influenced by 
the two factors, the extent to which an occluded stimulus conveys information about the original 
stimulus and the extent to which the occluded stimulus shares in the essential characteristic of 
the original stimulus, viz. symmetry. These two influences, the data suggest, may be to some 
extent complementary so that the reduction in the information load which occurs in the case of 
the horizontally and vertically occluded conditions is compensated for by the symmetry of these 
patterns. In consequence the scores obtained do not differ from those obtained with the laterally 
occluded stimuli, which lack symmetry but present subjects with enantiomorphs end hence all 
the required information. ; 

A further effect which may influence the responses is that of the symmetry of occlusions. If 
this were to have a similar influence to that attributed to the symmetry of the occluded stimuli, 
the disruptive effect of occlusion decreasing with the increase in its symmetry, then one would 
expect randomly occluded stimuli to yield correct responses least often, all the other types of 
stimuli yielding greater frequencies in accordance with the following schema: 

Rand. Skew. Hor. Lat. Vert. 

That such an effect might be present is suggested by cbservations made by Bell & Handel 
(1976) who found that reproduction of ‘good’ patterns is little affected by the nature of the 
masks. On the other hand, when poor patterns were used by them ‘masks which were good 
patterns led to better identification than masks which were poorer patterns’. 

Since there are considerable procedural differences between the present experiments and those 
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of Bell & Handel and since the ‘poor’ stimuli appear to be more likely to show the effect which 
the nature of occlusion may have upon responses it was decided to investigate the phenomenon 
using a procedure similar to that of the first experiment with random patterns as basic stimuli. 

The following hypothesis was put forward: That asymmetrical stimuli exposed for a short time 
and followed by symmetrical stimuli derived from them will be reproduced correctly more often 
than such stimuli followed by random stimuli. 


Experiment I i 
Method 
Subjects were drawn from the same population as that used in the experiment reported above and the 
procedure used was essentially similar. It differed only in that a new set of stimuli was used. The stimuli 
consisted of ten random patterns similar to those of Expt. I. These patterns were used to generate two sets 
of occluding stimuli: (a) symmetrical occluding stimuli were derived by rotating the original pattern about the 
central ‘vertical’ axis of the matrix and superimposing the pattern thus obtained upon the original pattern; 
(b) random occluding stimuli were derived by filling at random a number of vacant cells within the matrix 
already containing the original stimulus. Thus both the generated sets contained the same patterns as the 
original stimuli but in one case these patterns were incorporated in symmetrical patterns, in the other in 
asymmetrical patterns, 

Fourteen subjects responded to ten stimuli each, the procedure being identical with that used before but 
for the fact that the occluding stimulus was in each case chosen at random with the restriction that five of 
the patterns used with each subject were followed by one type and five by the other. 


Results 


The subjects’ reproductions of the stimuli were scored as correct or incorrect. In the case of 
stimuli followed by a symmetrical pattern subjects made 22 correct and 48 incorrect responses. 
In the case of stimuli followed by random patterns the corresponding scores were one and 69, 
Application of the sign test to the ratios of correct to incorrect responses under the two 
conditions shows a highly significant difference (P< 0-003) in the predicted direction Subjects 
were found to be better when the occluding stimulus was symmetrical than when it was 
asymmetrical, although in both cases the occluding stimulus contained the original pattern. The 
data are in agreement with Bell & Handel’s findings. Symmetrical occlusion is less effective than 
an asymmetrical occlusion. Since both types of occlusion incorporated the original pattern the 
observed difference cannot be attributed to a difference in quantity of information stored. but is 
probably due to perceptual interaction between the original patterns and occluding stimuli. 


General discussion 


The combined results of the reported experiments suggest that the perceptual processes involved 
in the tasks used cannot be adequately described either by the simple version of 
information-processing theory or simple gestalt paradigm. 

Although, as shown by the first experiment, symmetrical patterns are reproduced with greater 
accuracy than random stimuli, and hence both the notions of redundancy and of goodness of 
pattern are supported, the possible simplistic conclusion suggesting effective equivalence of the 
two descriptions is questioned by the analysis of the effect of various types of occlusion The 
latter shows that the perceptual system behaves in such a manner as to suggest that such 
characteristics of the stimuli as symmetry and information content can to some extent be 
mutually compensatory but such flexibility is not infinite. Had it been such, the skew-symmetry 
occlusion would have yielded results not differing from those obtained with lateral occlusion. 

The results yielded by the additional groups of the first experiment clearly show that global 
accentuation of the nature of the stimuli alone is not sufficient to affect performance. 

Although it seems unlikely that the subjects do not have a specific cognitive category for 
symmetrical patterns [indeed studies of discrimination learning (Deregowski & Ellis, 1974) 
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clearly show that symmetry is taken account of by even young children and Bear’s work (1973) 
suggests that asymmetrical patterns have strong symmetrical implications], it is apparent from 
the results that symmetry of an occlusion when it is unrelated to the original pattern leads to 
relatively low scores. On the other hand when such relationship is present a marked 
improvement in the frequency of correctly reproduced patterns obtains. 

Although the present experimental approach differs greatly from that used by Bell & Handel 
(1976) it probably does involve the same perceptual processes and hence the results obtained 
should be, in the gross, comparable. The agreement between these two sets of results, in that 
‘good’ patterns are found to be relatively ineffective as masks, has already been mentioned. In 
contrast no support is provided by the present data for Bell & Handel’s suggestion that marked 
stimuli are coded holistically. This suggestion was derived from the consideration of errors made 
by Bell & Handel’s subjects. These it was observed did not suggest any sequential encoding 
strategy, and hence were taken to indicate holistic encoding. The present results do not sustain 
this and suggest that in symmetrical patterns at least the encoding may involve subunits, which 
may be simple enantiomorphs. Such an interpretation agrees with Attneave’s description of the 


processes involved. 
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The subliminal perception of movement and the course of autokinesis 


Peter Walker and R. R. Meyer 





The course of autokinesis is shown to be sensitive to the real movement of a surrounding stimulus. With the 
supraliminal presentation of this stimulus, apparent movement in a direction opposite to that of the real 
movement is induced. With the subliminal presentation of the same stimulus the real movement serves to 
inhibit autokinesis by inducing brief periods of stationarity between the phases of upward and downward 
apparent movement. The results confirm previous findings that the movement of a stimulus may be 
discriminated without there being any perceptual (phenomenal) adjunct. 





Walker (1975) demonstrated the subliminal perception of movement by showing that the course 
of binocular rivalry may be influenced by the movement of a subliminal pattern that is presented 
within one of the rivalling fields. In order to confirm the generality of the phenomenon, an 
attempt was made to demonstrate the subliminal perception of movement in a rather different 
context. The course of autokinesis was selected as the dependent variable since it is known to 
be particularly sensitive to the presence of additional information in the visual field, (cf. Royce 
et al. 1966). It was predicted that the course of autokinesis would be sensitive to the real 
movement of a surrounding pattern that was presented at a subliminal level. Studies of induced 
movement (cf. Wallach, 1959) would lead us to anticipate such sensitivity in the case of the 
supraliminal presentation of real movement. 


Method 


Subjects restricted themselves to reporting the apparent upward and downward movement of a spot-source 
of light. Apparent movement in a horizontal direction was ignored, so that, for example, the subject was 
instructed to regard apparent movement of the spot in a diagonally upward direction as upward movement. 

Each subject completed three trials during which he reported the vertical component of the autokinetic 
movement. The trials were distinguished by the condition of a pattern that was projected on to a perspex 
screen immediately behind and surrounding the spot source of light. For the main group of 18 subjects this 
pattern was always subliminal (as defined below), but on any trial was either moving vertically upward 
(upward condition), vertically downward (downward condition), or was stationary (stationary condition). An 
equal number of subjects completed the three trials in each of the six possible orders. From the data 
recorded on each trial, the following could be calculated: the mean duration for which the spot appeared to 
move, without interruption, in an upward (downward) direction; the mean duration for which the spot 
remained stationary; the frequency and total duration of each of these response states; the mean duration of 
continuous apparent movement, regardless of the direction of this movement. 

A further group of just six subjects undertook the same task with the pattern presented at a level above 
the awareness threshold. Each of these subjects completed the three trials in a different order. 


Apparatus 

A 10 mA micro-lamp, that was mounted inside a small cylinder that had only a pin-hole in the edge facing 
the subject, served as the stimulus for the autokinetic effect. This cylinder was fixed against the centre of a 
translucent perspex screen (0-4 m diameter) which was let into an otherwise opaque frame that rested on the 
floor. The screen was situated 1 m from the seated subject, and the moving pattern was projected on to it by 
passing the edge of a rotating transparent disc through the focal plane of an Aldis Tutor 1000 projector. 
Commercial Letratone, pattern LT 100 (a quasi-random pattern, cf. Walker, 1975, Fig. 1), was applied to the 
disc. Viewed from this distance, the projected pattern traversed in a linear path, with a velocity of 20 deg 
sec™!, The irregular and fine texture of the pattern ensured that no fluctuation in gross intensity accompanied 
the movement, and that the pattern never took on a striped appearance. For the stationary condition the 
transparent disc was prevented from rotating, whilst for the upward and downward conditions the disc was 
made to rotate in opposite directions. 
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The experiment was undertaken in a light-proof cubicle, and the projector, motor, and disc were 
positioned inside a thick, opaque enclosure. Sufficient light strayed however, for fully dark-adapted subjects 
to be aware of the translucent perspex screen. The pattern, therefore, was projected on to an otherwise 
dimly illuminated screen. 

A series of Kodak Wratten neutral density filters was placed ın front of the projector lens. The critical 
effect of interposing a filter in this way was to decrease the contrast of the projected pattern. 

Two microswitches were provided for the subject to report the vertical component of the autokinetic 
movement, and each switch was connected to one channel of an Esterline Angus multiple channel 
pen-recorder 


Subjects 
Ten male and 14 female undergraduate students, ages 19-23 years, served as subjects; none were students of 
psychology. 


Procedure 


The three experimental trials were preceded by a period of dark adaptation that lasted 25 min. The absolute 
awareness threshold for the (upward) moving pattern, as defined in the previous study (Walker, 1975), was 
then determined. Thus, this threshold was defined as the lowest intensity level (in terms of the value of the 
neutral density filter placed between the projector and perspex screen) at which the subject ever reported an 
awareness of being stimulated by the moving pattern during a threshold determination procedure (cf. Dixon, 
1971, p. 12). The randomized double-staircase procedure described by Cornsweet (1962) was employed, the 
size of the ‘steps’ by which the intensity level was either decreased or increased corresponding to a value of 
0-1 for the neutral density filter. In respect of this procedure the subject was given the following instructions: 
‘When instructed, open your eyes and look at the screen in front of you. What IJ shall do is sometimes 
present a moving pattern on the screen, and sometimes not. What I want you to do is each time decide 
whether or not the moving stimulus is there. You do not actually have to see the whole pattern in motion, 

if you think you can see any movement, say ‘‘yes’’. The moving pattern is the only thing that will be 
presented.’ Subjects were asked to look at the screen for a period of at least 5 sec before reaching a 
decision, since the results from a pilot study indicated that the threshold was, to a certain extent, dependent 
upon the total time for which the pattern was viewed (cf. Kolers, 1972). In order to avoid their seeing the 
neutral density filters being changed, subjects were also asked to close their eyes between trials. After the 
value of the projected light had levelled out, oscillating between two values over five successive trials in 
each of the two staircase series (cf. Cornsweet, 1962), the procedure was terminated. A note was made of 
the lowest intensity level (highest value of neutral density filter) at which the subject reported the moving 
pattern to be present. For the trials involving the presentation of the pattern (moving or stationary) at a 
subliminal level, a 0-3 filter was added to the threshold-value filter. A corresponding reduction in the value of 
the filter ensured that the pattern was supraliminal when this was appropriate. 

The subject next undertook the three trials of the experiment, each of which lasted 4 min, in an order that 
was randomly selected from the six alternatives. Subjects were asked to report the vertical component of the 
apparent movement by alternately closing the two microswitches. More specifically, they were asked to 
press the switch in their left hand whenever and for as long as the spot appeared to move upward, and to 
press the switch in their right hand whenever and for as long as the spot appeared to move downward. When 
there was no vertical movement the subject was instructed to refrain from pressing either switch. The end of 
each 4 min period was indicated to the subject by a short auditory signal, and this was followed by a 30 sec 
rest period during which the subject closed his eyes. 

Finally, precautions were taken to detect those subjects for whom the subliminality of the pattern, in that 
condition, was unreliable. To this effect subjects were approached immediately following the three 
experimental trials with the question: ‘You did see the moving pattern didn’t you?’ It was intended that 
subjects answering positively to this question would be rejected from the experiment. Subjects answering in 
the negative were nevertheless subjected to a further interrogation, being asked; ‘What do you think the 
experiment is about ~ what is it that I am interested in?’ ‘Did you see anything that you may not have 
expected to see?’ ‘On how many trials, and which trials did I show you the moving pattern?’ A subject’s 
answers to these additional questions were interpreted in terms of the degree to which they indicated that he 
was aware of the true purpose of the experiment, and the extent to which this knowledge derived from an 
awareness of the supposedly subliminal stimulus. In the event, these additional questions were unnecessary. 
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Results and analysis 


No subject had to be rejected from the experiment as a result of the interrogation 

concerning the subliminal stimulus. That is, all subjects gave a negative reply to the first 
question, and remained consistent with this in answering the additional questions. Thus, they 
were unable to guess the purpose of the study, did not see anything other than an homogeneous 
screen and spot of light, and finally were unable to offer an answer to the last question. 


Effects from the supraliminal presentation of movement 


With regard to the mean duration of continuous apparent movement in a particular direction, 
the results demonstrate that this was relatively longer for apparent movement in the direction 
opposite to the real movement. From each trial, the difference between the mean duration of 
continuous upward movement and the mean duration of continuous downward movement, was 
determined. A non-parametric trend test for repeated measures, a development from Kendall’s 
rank correlation methods (cf. Walker, 19764 for the formula), revealed a significant trend across 
the three conditions (Z = 1-919, P< 0-05), the average difference scores for the upward, 
stationary and downward conditions being —4-71, 0-03 and 5-24 sec respectively (cf. Fig. 1). 
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Figure 1. Averaged over subjects, the mean duration for which the spot of light appeared, during a trial, to 
move up (A), remain stationary (W), and move down (Y) without interruption, when the surrounding pattern 
was presented at a supraliminal level and made to move upwards, remain stationary, or move downwards. 


Effects from the subliminal presentation of movement 


The most immediate pattern emerging from the results indicated that the course of autokinesis 
on trials subsequent to the first simply reflected the course of apparent movement arising on the 
first trial, cf. Figs 2 and 3. Taking this finding into account, statistical testing for the effects of 
the experimental variable was confined to the data arising on the first trial. 

Contrary to the effects of movement at a supraliminal level, in this condition there was no 
significant tendency for the moving pattern to induce movement in the opposite direction. When 
an analysis was undertaken on the values for the difference between the mean duration of 
continuous upward apparent movement and the mean duration of continuous downward apparent 
movement, the two movement conditions did not differ significantly (Mann-Whitney, U = 10, 
P> 0-05; m =n, = 6, cf. Fig. 2). However, it was revealed that with the presentation of 
movement, in either direction, the stationary phases of autokinesis were more frequent (U = 14, 
P<0-05; n =6, nm = 12, cf. Table 1) and reduced in mean duration (U = 14, P< 0-05; m =6, 
m= 12, cf. Fig. 2). As would be expected from this change in the frequency of the stationary 
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Table 1. Results from the first subliminal trial undertaken by the six subjects in each of the three 
conditions of the experiment. (a) the frequency of the stationary phases; (b) the probability that 
a movement phase would be immediately followed by a stationary phase rather than by 
movement in the opposite direction 


Condition 
Upward 30 16 30 9 5] 26 a 
(subjects 1-6) 1-00 0:81 0-89 0-89 0:83 0-93 b 
Stationary 1 1 23 7 1 18 a 
(subjects 7-12) 0-00 0-00 0-83 0-35 0-00 0-95 b 
Downward 8 19 9 8 12 18 a 
(subjects 13-18) 1-00 0-63 0-89 0:26 1-00 0-88 E 
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Figure 2. Averaged over subjects, the mean duration for which the spot of light appeared during a trial to 
move up (A), remain stationary (M) and move down (¥) without interruption, over successive subliminal 
trials. Results from subjects who first undertook the upward (- - - - - ), stationary (——) or downward 
condition (— —), Note, for the second and third trials the distinction between the three experimental 
conditions is not maintained. 
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Figure 3. For successive subliminal trials, the mean duration of continuous apparent movement, direction 
ignored, cf. Fig. 2 for code. 
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phases, the movement of the surrounding pattern gave rise to a decrease in the mean duration of 
continuous apparent movement, when the direction of this movement was ignored (U = 16, 
P<0-05; n =6, m = 12, cf. Fig. 3). Finally, the mean duration of continuous apparent 
movement was not significantly changed by the presence of movement, when either apparent 
movement upwards, or apparent movement downwards was considered separately. 

From this pattern of significant results, the critical effect of the movement of the subliminal 
pattern would appear to be the induction of stationary phases, of shorter duration than observed 
with the pattern stationary, at a time when autokinesis is changing direction. Looking at a rather 
different aspect of the results confirms this. Given in Table 1 are the values, derived from each 
subject’s first trial, for the probability that a period of autokinetic movement, in a particular 
direction, would be followed by a period of stationarity, before appearing to move in the 
opposite direction. A Mann-Whitney U test that compared the two movement conditions against 
the stationary condition, revealed that with the presentation of the moving pattern periods of 
autokinetic movement in a particular direction are more likely to be immediately followed by a 
stationary phase than by movement in the opposite direction (U = 14, P< 0-05; n =6, m = 12). 


Introspective reports 


Subjects in the subliminal condition reported that the dimly illuminated screen, that surrounded 
the spot source of light, spontaneously disappeared and reappeared. More importantly, however, 
these subjects also reported that the periods of apparent movement of the spot of light coincided 
with the disappearance of the surrounding screen. During the post-experimental interrogation, a 
number of subjects suggested that the purpose of the experiment was to investigate the 
relationship between the movement of the light and the presence of the surrounding frame. 


Discussion 


The results demonstrate that the course of autokinesis is sensitive to the real movement of a 
surrounding pattern, even when this pattern is presented at a subliminal level. Consistent with 
Wallach’s (1959) observations, movement in a direction opposite to that of the real movement 
was induced when the surrounding pattern was presented at a supraliminal level. In the 
subliminal condition, movement of the pattern served to induce periods of stationarity between 
the phases of upward and downward apparent movement. These periods were relatively brief 
compared to those observed in control conditions. These preliminary observations therefore 
confirm Walker’s (1975) finding that the movement of a subliminal stimulus may be 
discriminated. 

The observations are also consistent with the notion that the superior colliculus—posterior 
association cortex system is responsible for mediating subliminal perception (Walker, 19765) 
since this system is particularly sensitive to movement as a stimulus parameter (cf. for example, 
Mcilwain & Buser, 1967; Humphrey, 1968; Ikeda & Wright, 1972) and is particularly important 
for visual discrimination behaviour under conditions of low illumination (cf. Bender, 1973). 
Indeed, Péppel, Held & Frost (1973) have demonstrated that in patients suffering geniculo-striate 
lesions, the midbrain visual system is capable of mediating an orienting response to a moving 
stimulus that is not perceived. According to Walker (1976 b) the midbrain-posterior association 
cortex system is responsible for feeding forward information regarding unattended/unperceived 
stimuli, and for initiating a feedback matching process whose output is identified with perceptual 
experience. The feedback matching process is considered to rely upon sensory information 
provided by the geniculo-striate system, which is less sensitive than the midbrain system to 
moving stimuli of low contrast. Thus, Walker (1976 b) has suggested that whilst a subliminal 
stimulus may be adequate to engage the midbrain visual system, and hence influence the 
feedback matching process and the perception of other stimuli (e.g. the surrounding screen in 
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the present experiment, cf. below), it may not influence activity in the geniculo-striate system 
sufficiently to be itself perceived. 

A tentative explanation of the nature of the effects in the subliminal condition may be 
suggested, on the basis of subjects’ reports that periods of stationarity of the spot of light 
coincided with periods when the surrounding screen was perceived. These reports suggest that 
the effects on autokinesis were mediated indirectly, via the effect that the real movement had on 
the behaviour of the image of the screen. Thus, the subliminal movement appears to have 
encouraged the reappearance of the screen and thereby given rise to conditions that are 
counterproductive to apparent movement. The results indicate that this reappearance of the 
screen, that was encouraged by the presence of the subliminal movement, occurred whilst there 
was a momentary cessation in autokinesis, as its direction changed. That the image of the screen 
should have disappeared is understandable (cf. Evans, 1973) since the steady fixation of large 
stimuli under conditions of dim illumination gives rise to the phenomena that are typically 
obtained with more rigid stabilizing procedures (cf. also Evans & Piggins, 1963). If it is assumed 
that images behave in a similar manner in rivalry and under conditions of stabilized -viewing (cf. 
Walker, 19764) some support for the tentative explanation comes from the previous study 
(Walker, 1975). In this study, superimposing a moving, subliminal pattern on an homogeneous 
field served to reduce the time period for which the field disappeared. 

Finally, the unexpected observation that the course of autokinesis on the second and third 
trials in the subliminal condition simply reflected the course of the first trial is difficult to explain. 
It may be suggested that subjects generated expectations as to the course the apparent 
movement would take, and that these expectations later governed the observed pattern of 
autokinesis. That the phenomenon is very much determined by a subject’s expectations has been 
demonstrated (Sherif, 1935; Rechtschaffen & Mednick, 1955). 
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An intra-cultural investigation of susceptibility to ‘perspective’ and 
‘non-perspective’ spatial illusions 


A. Ahluwalia 


Conventional Miiller-Lyer and modified Miiller-Lyer (without ‘perspective’ cues) illusions were presented to 
two samples of children aged between eight and 19, matched in education, but living in ‘carpentered’ and 
‘uncarpentered’ environments in Zambia. Traditional differences in susceptibility have been obtained with 
both the variations of the Muller-Lyer illusion. In view of the lack of perspective cues in one of these, it is 
concluded that the perspective theory as presented within the ‘carpentered world hypothesis’ is inadequate. 
Since these differences are intra-cultural, they also do not support the hypothesis which suggests that 
cross-cultural variations in illusion susceptibility are due to genetic factors — such as macular (or retinal) 
pigmentation. 


Over the past 70 years much evidence has accumulated that shows systematic differences in 
geometrical illusion susceptibility between people living in western cultures and those living in 
primitive cultures (Rivers, 1901, 1905; Bonte, 1962; Segall, Cambell & Herskovits, 1963, 1966; 
Stewart, 1973; Weaver, 1974). Typically the findings indicate that people brought up in western 
cultures are more susceptible to the Miiller-Lyer illusion (M-L). To explain these differences 
Segall et al. put forward a hypothesis proposing that illusion susceptibility is a perceptual 
by-product (or inference habit) learnt from the early visual environment. This they argued using 
Brunswikian ideas of ecological cue validity. That is, the developing visual system adapts to the 
environmental cues available (Brunswik, 1956). This was an attractive explanation which 
appeared to fit the cross-cultural results, since as far as the M-L was concerned the greater 
susceptibility shown by western observers could be directly attributed to the abundance of 
rectangularity in the environment. 

Various detailed theories based upon this hypothesis have been forwarded. Of these the 
perspective theory is possibly the most extensively debated (Gregory, 1963, 1965, 1966). Briefly 
this proposes that the vertices in the M-L are interpreted by the subject out of all context as 
representing corners of a rectangular body, and since one pair of them is in opposite 
configuration to the other pair, then one of the edges (the line that joins two corners) is assumed 
to be further away than the other edge. Therefore, due to this phenomenon termed ‘misapplied 
constancy scaling’ and the presence of apparent perspective cues in the M-L figure, one of the 
lines appears longer. Some of the evidence used as a basis for this theory was Schiller & 
Wiener’s (1962) conclusions that the M-L, since it exists when presented binocularly (the ‘fins’ 
to one eye and the shafts to the other), must be caused by centrally located processes. The 
cross-cultural differences also support this and further indicate that these (processes) are 
cortical. But, one argument against such a detailed theory is that if the conventional M-L with 
fins at both ends of the shafts is modified by replacing the fins with circles (or other figures) the 
illusion still remains. Gregory maintains that the fins indicate the presence of perspective cues, 
however the same can not be said of the circles in the modified M-L (Zanforlin, 1967). Much 
effort has been made to test the perspective theory rigorously, and as it stands has many critics 
(Day, 1965; Hamilton, 1966; Wallace, 1966; Georgeson & Blakemore, 1973). 

In their study Georgeson & Blakemore showed that the addition of depth cues to the 
conventional M-L, so that not only are the apparent perspective cues present but also binocular 
depth (in which the fins appear in three dimensions), affects illusion susceptibility in a way 
contrary to that predicted by the perspective theory. Another argument frequently raised against 
this theory concerns learning. If learning is indeed an important factor then the relationship 


234 A. Ahluwalia 


between susceptibility and learning is not simple. The commonest occurrence in M-L studies is 
that susceptibility decreases with increasing age. Therefore to validate the perspective theory 
one has to assume that the learning which effects the M-L reaches an early asymptote and other 
negative processes tend to decrease the susceptibility following this stage. In fact Segall et al. 
(1966) have suggested that the inference habits are fully developed during the first three to four 
years of life (where susceptibility should be maximum) and the negative processes which they 
term ‘perceptual cognitive development’ are slower, possibly lasting several years. 

In recent years some controversy has been generated as a result of certain findings which 
suggest that the typical cross-cultural variations can be explained in terms of certain genetic 
differences between the commonest ‘carpentered’ and ‘uncarpentered’ samples used in previous 
studies, and that a similar genetically defined developmental process may be causing the decline 
in susceptibility during ontogeny. Pollack (1963) demonstrated using tachistoscopic presentation 
that an increase in the threshold for contour detection is accompanied by a decrease in M-L 
susceptibility as the subject age level ascended from eight to 12 years. It was suggested that this 
was due to an increase in concentration of macular pigmentation which is positively correlated 
with the threshold for contour detection and age (Pollack, 1963, 1969). Pollack & Silvar (1967) 
showed that the concentration of macular pigmentation is also positively correlated with the 
darkness of skin pigmentation, and that black American children were less susceptible to the 
M-L than white American children brought up in the same highly carpentered environment. 
From this they concuded that since most cross-cultural studies have contained dark pigmented 
uncarpentered subjects and light pigmented carpentered subjects, the susceptibility differences 
can be simply attributed to these genetic factors. In fact Segall et al.’s (1966) data provide some 
support for this hypothesis. For they observed that their uncarpentered sample also showed a 
decrease in susceptibility as a function of increasing age. This finding is obviously better 
explained in terms of the Pollackian notion. However, it should be pointed out that some 
workers have failed to replicate Pollack’s findings (Armstrong, Rubin, Stewart & Kunter, 1970; 
Stewart, 1973; Weaver, 1974) whereas others have confirmed them (Berry, 1971; Jahoda, 1971). 

Stewart’s (1973) study was conducted in Zambia, in which susceptibility of children (ages 6-17 
yr) to M-L and Sander-Parallelogram illusions was measured. Race (and in some samples also 
education) was held constant and the carpenteredness of the environment varied from a very low 
level to higher levels. The results showed a gradual increase in susceptibility as the 
carpenteredness increased for both the illusions. Besides this, Stewart conducted a similar study 
in Evanston, Illinois, in which the carpenteredness was held constant and race varied (from 
white to black children). No susceptibility differences were discovered and the usual age-related 
decline was present throughout the sample. Similar results were obtained by Weaver (1974) in an 
intra-cultural study conducted in Ghana. His work provides rigorous evidence against the 
Pollackian explanation since it incorporated material, method and procedure similar to Pollack 
(1963) and Pollack & Silvar (1967). Brightness contrast as well as hue contrast stimuli were 
administered using a tachistoscope with a short exposure time (which is a proposed prerequisite 
for demonstrating variations in contour detection threshold between races and ages). 

The present study was undertaken to investigate whether the traditional ‘carpentered world’ 
findings can be replicated while controlling for education and employing a simple apparatus 
similar in theoretical design to that used by Rivers (1901, 1905). Since the testing was to take 
place intra-culturally, the Pollackian ideas relating to cross-cultural differences were also under 
examination. More specifically if the carpentered world hypothesis is indeed a plausible one, 
then by using a modified form of M-L the Gregorian basis of the perspective theory can be 
scrutinized, saving the fact that this explanation fails to explain why a modified M-L should be 
illusory to a thoroughly carpentered subject (Zanforlin, 1967). This was conducted to give a fair 
test to the notions of misapplied constancy scaling since so much attention has been given to it. 
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` Method 
Subjects 


Carpentered environment. Five urban Lusaka (the capital of Zambia) schools were chosen and five subjects 
clustered into each of the 18 age (8, 9, 10, 11, 12, 14, 17, 18, 19 years) and sex groups. An extra effort was 
made to ensure that all the individuals were born in, and had lived all their lives in Lusaka or one of the 
other urban capitals. 


Uncarpentered environment. The majority of this sample lived within 20 miles radius of Mulobezi, and some 
subjects were also obtained from Shesheke both ın the Western Province of Zambia. Mulobezi is in the heart 
of the Zambian timber industry; however, apart from a few colonial type wooden domestic dwellings and 
one brick school, there are no buildings that may contribute towards a carpentered environment. Nearly all 
the accommodation in the surrounding villages consists of mud huts constructed from timber, grass (roof) and 
mud. The design of these huts is round (perimeter walls) with circular roofs. There are no tarmac roads in the 
vicinity, the nearest being the Shesheke—Livingstone highway some 70 miles away and movement is thus 
only possible on foot or in Landrovers. This is certainly an advantage since the hypothesis being tested 
requires that the subjects be particularly confined to a controlled environment throughout their past lives. 
Two schools were selected from this area and some subjects were also obtained from villages who either 
attended one of these two or a third one not visited. All these schools contained pupils of younger ages (8-14 
years). The 17-19 sample was derived from villages that contained these age groups matched in education 
with their carpentered counterparts, the residue was obtained from a secondary school in Shesheke some 120 
miles from Mulobezi. This school was much larger, more architectured, and situated in a relatively (to 
Mulobezi) more carpentered environment. To eliminate the uncertainty set-in by this factor as much as 
possible only those pupils who were born in villages, and had spent most of their primary school lives there 
were considered. In fact Segall et al. (1966) have emphasized that the inference habits are possibly fully 
developed by the third or fourth year of life. Therefore for the present purposes this sample may be 
considered just as uncarpentered as the one just described. The uncarpentered sample also contained a total 
of 90 subjects with each group of five matched in sex, age and education with the 90 carpentered 


counterparts. 


Apparatus 


Five stimuli were constructed in which the subject was required to make an adjustable line the same length 
as a standard line also appearing in the field. All these had constant field size and were presented in the same 
wooden holder one at a time. The basic design of these is shown in Fig. 1. (Note: only three - one control, 
one conventional M-L and one modified M-L are shown here; there were two colour variations in each of 
these M-Ls, thus a total of five stimuli ) The standard line in the control stimulus and in all M-L variations 
was 100 mm long (all lines in the stimuli were 2 mm thick), the adjustable being always on the right and the 
top of the standard line 50 mm below the top of the adjustable. These two were laterally separated by 80 
mm. The adjustable mechanism comprised a sliding section (with a push-pull tag) fitted freely into a groove 
system. This section overlapped the underlying line variably so that it could be manipulated to expose any 
desired length of the line (thus ‘the adjustable line’). All the construction was out of heavy smooth 
cardboard with all background paintwork matching approximately N 7-5/ and the black lines approximately 
N 2-5/ Munsell specification. A millimetre scale was printed on the back of the sliding sections which could 
be read through a rectangular window from behind the stimulus. The reading was very simple, when a 
perfect match occurred an arrow on the side of the window pointed at 0 mm with descrepant matches being 
indicated by positive and negative values as the case may be. Since in the control and all the M-L variations 
the standard line was 100 mm, each millimetre mismatch also meant 1 per cent error. 

There are two types of conventional M-L, each obtained by drawing fins at the ends of the two lines in the 
control stimulus (Fig. 1). In all the M-Ls the line usually perceived to be shorter always appeared on the left 
(thus ‘the standard’). In one of the conventional M-Ls, the fins were blue (approximately 2:5B 6/8) and in 
the other they were red (approximately 2-5R 4/12). The use of different colours was assumed to have 
practical as well as psychological advantages. Foremost it would be easier for the subjects to ‘see’ what was 
to be matched and secondly the use of two separate stimuli although different for the subject but similar for 
most theoretical purposes could provide a control in itself. They should also help maintain the subject’s 
interest in the ‘novel’ task every so often instead of one ‘boring’ one for all the trials. The angles between 
the fins were made 90° symmetrical and the length of each fin 25 mm. 

In the modified M-L, instead of the fins, 26 mm internal diameter circles were drawn. One stimulus had 
blue circles (approximately 2-5PB 3/8) and the other had red (approximately 2-5R 4/12). 
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Figure 1. Design of the stimuli: (a) the control, (b) the conventional Müller-Lyer illusion (M-L), (c) the 
modified M-L. The presentation was by placing these in a wooden stimulus holder (not shown) one at a time. 
In each stimulus one of the lines could be adjusted to different lengths by occluding or revealing it with a 
sliding section freely fitted into a groove system. This was possible by pulling or pushing a tag (shown 
hatched in the drawings). The section also had a millimetre scale printed on its back which could be read 
through a rectangular window from behind the stimulus. Note that all the M-L variations are simply obtained 
by drawing the lines in the control under different contexts. There were a total of 4 M-Ls; red and blue 
finned conventional, and red and blue circled modified. 


During construction much effort was made to ensure that the paintwork was as smooth as possible so that 
the criticisms levelled against Bonte’s (1962) apparatus should not be valid for these stimuli. 

The presentation to the Lusaka sample took place in a brightly lit empty classroom, and similarly for most 
of the uncarpentered sample, except for those subjects who were tested in villages. This was done on the 
back of an open Landrover under daylight conditions. In all testing the subject sat on a stool or chair in 
front of a table on which the wooden stimulus holder was placed. The experimenter sat on the left of the 
subject and an assistant behind the stimulus screen. 


Procedure 


All the tests were conducted with individual subjects after telling each of them that we were only interested 
in investigating how well they could see and that no grades would be given. With a majority of the 
carpentered sample English was the means of communication and one of the local languages was used if the 
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need’ arose. This was done with the help of the assistant who was fluent in all the common local languages 
Where possible English was also used with the uncarpentered sample but in most cases chi-Lozi the 
Western Province language had to be applied. Stereotypic instructions were maintained throughout, and 

it was clear following the control experiment whether a subject had understood the task well. 

All the subjects were initially given a comprehension test using the control stimuli, in which the adjustable 
line was manipulated so that the two lines were of obvious unequal length. The subject was required to point 
at the shorter or longer of the two. This was repeated using various combinations of lengths and if no wrong 
answers were obtained the subject was tested further, otherwise he or she was rejected. Now the 
experimenter made the adjustable the same length as the standard line with much emphasis upon how this 
was ‘carefully’ done. When the two were of equal length the subject was again asked to point at the longer 
or shorter of the two and if the verbal answer was that they are equal, the subject was allowed to continue. 
In case of a wrong answer he or she again went through the procedure from the start and if failure to obtain 
an adequate response occurred this time, then the subject was rejected. After this, the subject was allowed 
to handle the apparatus and to become familiarized with it. He or she was then asked to leave the apparatus 
alone and the experimenter made the adjustable line as long (160 mm) or as short (0 mm) as possible. The 
subject was now instructed to make the adjustable line the same length as the standard and when this task 
was complete they were required to take their hands off the table and utter ‘yes’. This trial performance was 
recorded and the procedure repeated with the adjustable line at the opposite extreme initially. If both these 
trials indicated that the subject could match to within an accuracy of +10 mm (or within 10 per cent 
perfection), then the experiment was continued with him or her. The subjects who failed to satisfy this 
criterion were rejected. (However those who succeeded were usually found to be much better than this 
arbitrary limit.) Altogether for the control, six successive trials were given, three ascending and three 
descending randomly distributed within each subject. 

With all the M-Ls no mention was made of any colours in the stimulus field. The illusory black lines were 
emphasized to the subjects by running their index finger tip over the lengths. Each of the M-Ls was initially 
used in a similar comprehension test as the control. Then the adjustable was made as short or as long as 
possible and the subject instructed to make the two the same length. Altogether four such trials, two 
ascending and two descending were administered for each of the M-L variations. The order of presentation 
of the four M-Ls was randomized within each subject to avoid a possible artifactual learning trend. 


Results 


For each subject there were 22 scores (16 for the M-Ls and 6 for the control). Each subject’s 
performances upon the five stimuli were collapsed into five single means. Using a computer two 
separate analyses of variance (Anova) were performed upon these scores. 

For the control a simple independent factor age (9)x environment (2)xsex (2)x subjects (5) 
design was used. A mixed design with three independent and two repeated factors was employed 
for all the M-L data. This was age (9)xXenvironment (2)xsex (2)X(M-L-type (2)xcolour 
(2)x subjects (5)). The M-L-type refers to the two variations (conventional and modified) and 
colour refers to the red/blue fins or circles. The F ratios and significance values were calculated 
from the computer outputs using the relevant error terms. 

Upon the control scores Anova showed no main effects or interactions. Therefore this 
indicates that the apparatus/procedure was manipulated and instructions understood as required, 
and further that the lines were not producing an illusory artifact just because of their relative 
positions in the stimulus field. More importantly all the subjects tested upon the subsequent 
illusions were able to accurately match the two lines in the control. 

In all the M-L data a declining susceptibility with increasing age has been observed. This is 
clear in Figs. 2a (female) and 2 b (male) where each age group’s mean illusion susceptibility 
pooled across the colours has been plotted. This traditional observation in the present study is 
confirmed by the highly significant level of age (F = 67-46, d.f. 8, 144, P< 0-005). Although less 
clear the uncarpentered sample also shows this trend. Table 1 gives the full Anova for the 
M-L data. 

Of all the main effects environment is the most significant (F = 496-16, d.f. 1, 144, P< 0-005) 
with the uncarpentered half of the sample being far less susceptible to all the M-L variations. 
Table 2 shows the strengths for all the M-L variations meaned across all the subjects in a given 
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Table 1. The full analysis of variance (Anova) for the M-L data. Shape (of the context: circles 
or fins) refers to the two Miiller-Lyer illusion variations. 








Source d.f. M.S. F 

Age (A) 8 747-092010 67-46*** 
Environment (E) 1 5494-612500 496-16*** 
Sex (S) 1 59-512500 5.37* 
Shape (Sh.) 1 698-168060 83-95*** 
Colour (C) 1 218-901 390 "2-85 n.s. 
AXE 8 178-428 130 16-11*** 
AxSh. 8 70-489931 8-48"** 
AXS 8 96:196875 8-697** 
AXC 8 10-398 264 <ins. 
ExS 1 71-568056 6-46* 
ExSh. 1 230-068 060 27-67 *** 
Exc 1 13-068056 <Ins. 
SxSh. 1 6-234722 <in.s. 
SxC 1 87-501389 1-14 n.s. 
Sh.xC l 0-868056 <1 n.s. 
AXExS 8 76-214931 6:88*** 
AxXSxSh. 8 32:762847 3-94*** 
AXExSh. 8 40-214931 , 484*** 
AXExC 8 3-827431 <I n.s. 
AxSxC 8 6-604 514 <Ins. 
AxSh.xC 8 13-283 681 <1 ns. 
ExSxSh. 1 0-068 055 <l n.s. 
ExSxC l 3-334722 <in.s. 
ExSh.xC 1 146-701 390 8-99*** 
SxSh.xC 1 35-112500 2-15 n.s. 
AXExSxSh. 8 21-433 681 2-58* 
AXExSxC 8 2-412847 <In.s. 
AXExSh.xC 8 4.979514 <I n.s. 
AxSxSh.xC 8 13-921875 <I n.s. 
ExSxSh.xC I 4-834 722 <Ins. 
AXExSxSh.xC 8 16-694 097 <Ins. 





* P<0-025; ** P<0-01; *** P<0-005 


category of the sample. It is again clear that the uncarpentered sample in all cases was much less 
susceptible to the M-Ls. 

Anova also produced a significant main effect of sex (F= 5-37, d.f. 1, 144, P< 0-025). The 
summation of means over all the ages indicates the males in the carpentered sample to be more 
susceptible, especially to the modified M-L. As should have been expected there is no main 
effect of colour. However, there is a significant susceptibility difference between the M-L 
variations (F= 83-95, d.f. 1, 144, P< 0-005). This is being produced mainly by the younger 
members of the sample, with both environmental groups showing greater susceptibility to the 
modified M-L (see Table 2). There is further a significant interaction between environment and 
age (F= 16-11, d.f. 8, 144, P< 0-005). The general susceptibility of the two environmental 
samples is clearly linear and convergent. This observation indicates that some previous research 
(Gregor & McPherson, 1965; Berry, 1966; Jahoda, 1966) has failed to discover intra-cultural 
effects of environment because, saving other criticisms, the subjects in these studies were 
essentially adult. Further work must take this into consideration since it is clear that 
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Figure 2. Susceptibility to conventional and modified Milller-Lyer illusions (M-Ls) of carpentered (carp.) and 
uncarpentered (uncarp.) subjects plotted against age for (a) females and (b) males. 

The ordinate gives the mean percentage illusion strength or underestimation in millimeters of the line 
which is usually perceived to be longer. Since the standard line was 100 mm both these values are 
coincident. The performances have been pooled across the scores for different colour fins/circles in the M-L 
variations. (As expected the analysis of variance showed colour to be a non-significant factor.) 


Table 2. Overall comparison of mean illusion susceptibility of carpentered (carp.) and 
uncarpentered (uncarp.) people 

The means have been taken across all ages in a given sample. All values are in percentage illusion strength. 
It is clear that most of the subjects were more susceptible to the modified Miiller-Lyer illusion. Generally the 
males of the carpentered environment show greater illusion susceptibility than their counterpart females. The 
most interesting finding is the marked difference in susceptibility between the environmental groups to all 

the illusions (bottom two rows). 


Sample Conventional M-L Modified M-L 
Male carp. 14-9 15-9 
Male uncarp. 7-4 10-5 
Female carp. 13-5 14-1 
Female uncarp. 75 10-4 
Male+female carp. 14.2 15-0 
Male+female uncarp. 74 10-5 


susceptibility differences are obvious in all the younger subjects of the present sample and less 
so in the 17, 18 and 19 age groups. There are other second- and higher order interactions which 
reached significant levels as can be seen in Table 1, however, they do not concern the present 
discussion, 


Discussion 

The results presented are clear cut for the conventional Miiller-Lyer illusion (M-L). These a 
findings confirm previous observations that people living in carpentered environments are more 
susceptible to this illusion. Since the present results have been obtained using a very different ‘i 


240 A. Ahluwalia 


apparatus from that employed in most previous research, they strengthen the ‘carpentered world 
hypothesis’ as a possible explanation for cross-cultural and intra-cultural differences in 
perception. The fact that the usual age-related decline in susceptibility is evident throughout the 
M-L data further indicates that a true illusory phenomenon was measured. However, of 
theoretical interest is the novel discovery that similar differences are also obtained with the 
modified form of the M-L. Within the context of the carpentered world hypothesis, this finding 
poses a direct challenge to an explanation in terms of the perspective theory (Gregory, 1963, 
1965, 1966). It is clear that the modified M-L contains no perspective cues as are assumed to be 
present in the conventional M-L, therefore a question of ‘triggering’ some learned process of 
constancy scaling does not arise. This obviously raises further problems for the theory apart 
from its failure to explain why a modified version as used in the present study should be illusory 
to a thoroughly ‘carpentered’ subject. 

The results show a significant main effect of the M-L type, however when examined closely the 
functions for the two variations indicate comparable age trends. For this it seems reasonable to 
conclude that both of them are caused by similar cognitive mechanisms and the subject employs 
identical strategies when confronted with either. This is important for further theoretical work 
since an explanation based solely upon the conventional M-L is clearly inadequate. 

The present findings do not support an explanation of the cross-cultural differences in terms of 
genetic factors as proposed by Pollack & Silvar (1967). The criticism of previous research which 
failed to obtain similar results has been mentioned (see Results). However, the methods and 
material used in this study could not (since stimuli contained hue-contrast as well as brightness 
contrast; and the presentation was non-tachistoscopic) provide a test for the other Pollack notion 
(1963, 1969) which suggests that the decrease in M-L susceptibility with aging is due to an 
increase in the concentration of macular pigmentation (which increases the threshold for contour 
detection). In fact, as in Segall et al. (1966), some of the present uncarpentered sample also 
show the typical age-related decline in susceptibility. Therefore this could be used as evidence to 
support the macular pigmentation hypothesis as it relates to age. 


Acknowledgements 


This study was carried out as an assessed third year undergraduate project for the degree in Neurobiology of 
the University of Sussex. I would like to thank Professor N. S. Sutherland for initial encouragement and 
introduction letters, Dr R. Serpell of the University of Zambia for facilities and advice provided, and Dr M. 
A. Georgeson of the University of Sussex for help in statistics and a critical reading of an earlier manuscript. 


References 


ARMSTRONG, R. E , RUBIN, E., STEWART, M. & 
Kunter, L. (1970) Susceptibility to the 
Müller-Lyer, Sander Parallelogram and Ames 
Distorted Room illusions as a function of age, sex 
and retinal pigmentation among urban Midwestern 
children. Unpublished. 

Berry, J. W. (1966). Cultural determinants əf 
perception. Unpublished Doctoral Dissertation, 
University of Edinburgh. 

Berry, J. W. (1971). Muller-Lyer susceptibility 
Culture, ecology or race? Int. J. Psychol. 6, 
193-197. 

BonTE, M. (1962). The reaction of two African 
societies to the Miiller-Lyer illusion. J. soc. 
Psychol. 58, 265-268. 

BRUNSWIK, E. (1956). Perception and the 
Representative Design of Psychological 
Experiments Berkeley: University of California 
Press. 

Day, R H. (1965) Inappropnate constancy 
explanation of spatial distortions. Nature, Lond. 
207, 891-893. 


GEoRGESON, M. A. & BLAKEMORE, C. (1973). 
Apparent depth and the Muller-Lyer illusion. 
Perception 2, 225-234. 

GREGOR, A.J & McCPHERSON, D. A (1965). A 
study of susceptibility to geometric illusions 
among subgroups of Australian Aborigines. 
Psychol. Africana 11, 1-13. 

GREGORY, R. L. (1963). Distortions of visual space 
as inappropriate constancy scaling. Nature, Lond. 
199, 678-680 

GREGORY, R. L. (1965). Seeing in depth. Nature, 
Lond. 207, 16-19. 

GREGORY, R. L. (1966). Eye and Brain. London. 
Weidenfeld & Nicolson. 

HAMILTON, R. L. (1966) Susceptibility to the 
Muller-Lyer illusion and its relationship to 
differences in size constancy Q. JI exp. Psychol. 
18, 63-72. 

JAHODA, G. (1966). Geometric illusions and 
environment: A study in Ghana. Br. J. Psychol. 
57, 193-199 

JaHopa, G. (1971) Retinal pigmentation, illusion 


An intra-cultural investigation of spatial illusions 241 


susceptibility and space perception. Int. J. 
Psychol. 6, 199-208. 

PorLacx, R. H. (1963). Contour detectability 
thresholds as a function of chronological age. 
Percept. mot. Skills 17, 411-417, 

POLLACK, R. H. (1969). Some implications of 
ontogenetic changes in perception. In D. Elkind & 
J. H. Flavell (eds), Studies in Cognitive 
Development: Essays in Honor of Jean Piaget. 
New York: Oxford University Press. 


PoLLacx, R. H. & Silvar, S. (1967). Magnitude of ` 


Müller-Lyer illusion in children as a function of 
pigmentation of the Fundus oculi. Psychon. Sci. 8, 
83-84. 

Rivers, W. R. H. (1901). Vision. In A. C. Haddon 
(ed.), Reports of the Cambridge Anthropological 
Expedition to the Torres Straits, vol. 11, pt. 1. 
Cambridge: Cambridge University Press. 

Rıvers, W. R. H. (1905). Observations on the senses 
of the Todas. Br. J. Psychol. 1, 321-396. 

SEGALL, M. H., CAMPBELL, D. T. & HERSKOVITS, 


M. J. (1963). Cultural differences in the perception 
of geometric illusions. Sci., N.Y. 139, 769-771. 

SEGALL, M. H., CAMBELL, D. T. & Herskovits, 
M. J. (1966). The Influence of Culture on Visual 
Perception. Indianapolis: Bobbs-Merrill. 

SCHILLER, P. & WEINER, M. (1962). Binocular and 
stereoscopic viewing of geometrical illusions. 
Percept. mot. Skills 15, 739-747, 

STEWART, V.M (1973). Tests of the ‘carpentered 
world’ hypotheses by race and environment in 
America and Zambia. Int. J. Psychol. 8, 83-94. 

WALLACE, G. K. (1966). Optical illusions. Nature, 
Lond. 209, 327-328. 

Weaver, D. B. (1974). An intra-cultural test of 
empiricistic vs. physiological explanation for 
cross-cultural differences in gometric illusion 
susceptibility using two illusions in Ghana. 
Unpublished Doctoral Dissertation, Northwestern 
University. 

ZANFORLIN, M. (1967). Some observations on 
Gregory’s theory of perceptual illusions. Q. JI 
exp. Psychol. 19, 193-197. 


Received 2 January 1976; revised version received 17 February 1977. 


Requests for reprints should be addressed to Arun Ahluwalia, 275 Eastcote Road, Ruislip, Middx. 


Br. J. Psychol. (1978), 69, 243-255 Printed in Great Britain 243 


Memory for prose: Quantitative analysis of recali components 


I. M. Cornish 





Previous work on recalling prose material can be criticized for its limited use of quantitative analysis and for 
neglecting the theoretical implications of the distinctions between verbatim and other forms of recall. A 
controlled set of nine passages was specially written, and admınistered over three sessions to 18 subjects. 
Two analytic schemes were applied, one based on clauses, the other, a well-defined one, used the actual 
words to split reproduced material into verbatim, non-verbatim and intrusive components. These proved a 
qualified success, but the use of systematic qualitative analysis in future was recommended. Most of the 
variations in quantity of material recalled came from vanations in the verbatim component only, whether 
associated with inter- or intra-individual differences, and the importance of verbatim recall justified paying ıt 
more attention in future. 


Many investigations into the recall of prose material have suffered from two major 
shortcomings: an inadequate use of quantitative analysis, and the ambiguous use of the 
distinction between verbatim and other measures of recall. Bartlett (1932), for instance, evaded 
both issues. He omitted quantification completely, and paid no special attention to whether the 
original words were recalled, unless marked semantic changes were involved. 

Quantitative analysis has so often been restricted to simple indices of the amount of material 
recalled, or to an evaluation of subjects’ ‘accuracy’ (e.g. King & Cofer, 1960; Howe, 1970; 
Brockway, Chmielewski & Cofer, 1974). Neither use seems to take full advantage of the 
material obtained from subjects. A simple a priori categorization of material, as ‘verbatim’, 
‘non-verbatim’, ‘intrusive’ and ‘omitted’, for instance, would be more realistic and less 
restricting. Kay (1955) used a scheme like this but, though aware of many inherent 
shortcomings, still restricted it to providing ‘a fair overall impression of the accuracy of a 
particular version’ (p. 84). By avoiding discussion of the theoretical implications of his recall 
components, and by not refining them in the light of qualitative analysis, he surely limited the 
conclusions he could draw from his data. 

Cofer (1941) and Howe (1970) employed similar quantitative analysis, but no qualitative 
analysis. Gomulicki (1956) and Zangwill (1956) did quantify selected qualitative findings, but 
restricted themselves to simple frequency counts. Oldfield & Zangwill (1938) looked at ‘errors’ 
using an eight-category breakdown, but discussed their interactions only in terms of learning 
passages by repetition, and largely ignored verbatim recall per se. None of these authors, 
however, has gone so far as to systematically observe the relations among recall components 
brought about by variation of factors present in their experimental designs. 

A related problem is whether to reckon recall in terms of sense (‘content’) or of literal recall 
of the original words. Separate verbatim and content measures have usually been regarded as 
alternative approximations to ‘quantity of recall’, little play being made of whether the 
distinction is an analytic convenience, or represents differences among underlying processes. 
Content measures really combine material recalled verbatim with non-verbatim material which 
nevertheless reproduces some of the original sense. Too often the latter is dismissed as ‘errors’ 
(e.g. Oldfield & Zangwill, 1938; Kay, 1955). 

Oldfield & Zangwill (1938) found with repeated presentation of passages that literal recall was 
produced by ‘progressive differentiation and organization’ of an initially ‘very general scheme ` 
possessing ‘a certain amount of dominant detail’, but seem to assume from the start that 
memory consists of one integrated record or set of processes. Brockway et al. (1974) developed 
a rating scale to measure the ‘accuracy’ of recalled statements, but the scale merges verbatim 
and content judgements, the former being implicitly regarded as a limiting case of the latter. 
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Unsurprisingly, to learn a passage completely is more difficult by verbatim than by content 
criteria (e.g. Cofer, 1941; Hunter, 1964). Some confusion arises because it is logically possible to 
perform perfectly by content criteria alone, yet not by verbatim criteria alone. Many writers 
reserve the term ‘verbatim recall’ for the completely learned case, and seem to forget that some 
material is recalled in the original words even after single presentations. This can be seen from 
any study using both kinds of measure, and it surely represents a situation closer to Bartlett’s 
(1932) ideal of ‘everyday’ remembering. Cofer (1973) has suggested that ‘there are two entirely 
different sets of recall processes’ in the verbatim/content or verbatim/non-verbatim distinction. 
Obviously this possibility is a neglected issue of great theoretical value. 

The present study therefore, is an attempt to overcome many of these largely methodological 
shortcomings. An analytic scheme like that suggested above was carefully defined to cover all 
reproduced material, but to be capable of future refinement. The experimental design permitted 
systematic observations of interrelations among these recall components. It was also found 
necessary to write a special set of passages to minimize effects resulting from using too few or 
too dissimilar passages (e.g. Alper & Korchin, 1952; Kay, 1955; Paul, 1959), or from an 
obviously synthetic appearance (e.g. Myers, Pezdek & Coulson, 1973; Brockway et al. 1974), 
frequent failings in the literature. 


Method 
Subjects 


Nine male and nine female undergraduates of Durham University, from a variety of disciplines, acted as 
subjects in this experiment. All were unpaid volunteers. 


Passages 


Nine passages, reproduced in the Appendix, were specially written by the experimenter to the following 
criteria, to produce a fairly homogeneous set of material. 

(i) Each contained exactly 225 words and 30 clauses (as defined in the Analysis section). The frequencies 
of clauses containing given numbers of words were the same for each passage: there were five each of 7 and 
8 words, four of 6 and 9, down to one each of 3 and 12 words. The mean number of words per clause was 
7-5, a reasonable figure for prose fiction, from a small-scale study. 

(ii) Each passage was written in an attempt to produce uniformity of certain trivial or extreneous factors: 
style, compression, level of difficulty, etc. To this end, the single author was likely to be an edvantage. 

(iii) Each passage was subject to the criterion that ıt should be accepted by subjects as fairly ‘normal’, 
story-like material. 

(iv) The nine passages were generated crudely from all combinations of three types of content and three 
types of form. Content was specified broadly as: ‘primitive/mythical’ (A), ‘familiar/domestic (B), and 
‘technical/mechanical’ (C). Form was specified according to an a priori determination of the apparent 
interrelations among clauses. In ‘linear’ passages (1), each clause seems to follow from the previous one; in 
‘branching’ ones (2), there was a linear sequence with ‘side branches’, in nodal ones (3), there was just a 
collection of ‘branches’ sharing a common starting-point, but without any dominant sequence. These 
deliberate differences were an attempt to produce a ‘controlled variety’ among passages, including 
essentially narrative, descriptive and mixed forms (cf. Gomulicki, 1956). The influence on recall of passage 
construction will be left for analysis elsewhere, and little reference will be made to it on the present paper. 

(v) Despite such similarities as were introduced, however, care was taken to leave sufficient dissimilarities, 
particularly of content, to avoid confusing subjects. 


Design 


Each subject was given three passages on each of three sessions, several days apart, and the distribution of 
passages among these nine trials was subject to a number of criteria: 

(i) No session contained more than one passage of each form and each content. 

(ii) No subject received more than one passage of each form and each content in the same within-session 
trial position. 

(iii) Over all subjects, each type of content was followed by each other type an equal number of times. 
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(iv) Over all subjects, each passage occurred in each of the nine positions exactly twice. 
(v) Each subject, of course, received each passage once only. 
(vi) Subjects were allocated to presentation sequences at random. 


Instructions 


It was thought likely that the instructions to subjects might be an especially critical part of the procedure. 
Those used were designed to strike a balance between encouraging subjects to recall all they could think of, 
regardless of certainty or accuracy, and asking them to recall only what they were certain of, or only the 
original words. The very nature of a ‘test’ could lead subjects towards assuming that accuracy, of whatever 
kind, was a foremost consideration. An attempt was made in the instructions and setting to dispel such 
assumptions while not preventing subjects from reproducing original wording whenever this could be 
recalled. 

The instructions themselves are presented during the Procedure. The complete forms were given only on 
first occurrence; repetitions were always of abridged versions, otherwise they would have been redundant 
and tedious to all concerned. All instructions were read to subjects by the experimenter. 


Procedure 


Subjects were tested individually or, occasionally, in pairs, in a ‘relaxed’ environment, a typical student 
study-bedroom such as they all might be familiar with (cf. Zangwill, 1939). The preliminary instructions were 
read: ‘This is an experiment to find out how people understand and remember prose passages of various 
sorts. There will be three sessions of which this is the first; each session will follow the same procedure. In 
each session you will read three short passages, making nine altogether. They are all different, but are of 
about the same length; all passages will be given to you on slips of paper, typewritten. After each passage 
you will be asked to write out as much as you can remember. This will be repeated for each of the three 
passages in each of the three sessions. Each session will last about 45 min. Stop me at any time in a session 
if you are not sure about something. You will be asked at the end of every session not to mention anything 
about any of the passages to anybody else who might be taking part, as this could invalidate their results. 
Are there any questions?’ 

Subjects were then given the first passage face down; it was untitled and typewritten on a slip of paper, 
and followed by instructions for reading: ‘You have now been given the first passage to read. The passages 
are very short and are all the same length. You should read each one through twice, remember twice only, at 
your normal reading speed. | want you to read each just as you would read a passage in a book or a 
newspaper. I only want you to follow the passage quite normally, to understand it, and, if possible, to enjoy 
it. I do not want you to make any special effort to commit any of it to memory. In particular, I am not 
interested in how accurate your memory is for the precise wording of the passage. Any questions? You can 
start now, and let me know when you finish.’ 

Subjects turned over their slip of paper and read the passage When finished, they were given a pen and 
A4 sheet of ruled note-paper, and were read the recall instructions: ‘Now, I am going to ask you to write 
down as much of the passage as you can remember. I am not interested in the exact words used originally, 
but if you do happen to remember them so much the better. Take your time over this part of the experiment: 
there’s no need to hurry. If there is anything you remember that you are not sure of, underline it in your 
account: there may be quite a bit you can’t recall, so don’t worry about it. When you have finished, check 
through what you have written, and make any corrections or additions you want to, using footnotes if you 
like. Spelling doesn’t matter, and neither does punctuation. Are there any questions? Don't write your name 
on the paper, begin when you are ready, and let me know when you finish.’ 

The interval between finishing reading a passage and beginning the written reproduction was usually made 
to last at least half a minute, often rather longer. While subjects were recalling the passages, the 
experimenter busied himself at his desk with such miscellaneous activities as reading, writing and 
paper-sorting. Whey they seemed to have completed recall to their satisfaction, they were reminded to check 
through their scripts, which were collected. Subsequent passages in the first session, and all passages in 
subsequent sessions, used abridged and suitably amended instructions. Each session ended with the 
reminder: ‘Finally, I would like you not to mention anything about any of the passages to anybody else who 
might be involved in this experiment, as this could invalidate their results.’ 

Sessions lasted 45 min on average, with extremes of about 30 and 60 min At the end of the last session, 
those subjects who were interested were given a brief account of the nature and purpose of the experiment. 
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Word scores 

Initially, the experimenter devised a set of marking criteria elaborated from the scheme already 
suggested: this classified each word of a subject’s reproduction as verbatim (V), non-verbatim 
(X) or intrusive (I), and counted the number of words altogether (W) of each type. Although 
satisfied with his own consistency, the experimenter was aware of making arbitrary decisions on 
many small points, which gave rise to doubts about the reproducibility of his results by other 
judges. A small-scale marking study was carried out to assess reproducibility and to refine the 
initial definitions. 

Briefly, three independent judges were given simple definitions of the recall components, to 
encourage divergent interpretation. They re-marked copies of seven subjects’ versions of passage 
2 C from the present study. Kendall Coefficients of Concordance for the agreement among the 
judges and the experimenter were V (0-96), X (0-67), I (0-83) and W (0-99); P< 0-01 on each case. 
Agreement was better on the relative magnitudes of the three scores than on their absolute 
magnitudes. Qualitative examination of judges’ differences led to much fuller and less ambiguous 
marking criteria for V, X, I and W, on which the following definitions have been based. 

(i) Preliminary examination: Subjects’ scripts were checked through to exclude non-textual 
notes or asides and all but the first item from sets of alternatives. Obviously omitted words were 
restored using only the minimum number of words required for grammatical sense. 

(ii) Total words, W; W was obtained by counting all words in a script after preliminary 
examination. Abbreviations (such as ‘&’, ‘@’ and numerals) and contractions were regarded as 
separate words, as were the components of hyphenated compound words. ‘Standard’ 
abbreviations such as ‘etc.’, ‘i.e.’ were counted as single words. 

(iii) Verbatim recall, V: Verbatim recall was scored by counting all the words in a script 
exactly the same as corresponding ones in the original passage. Non-hyphenated compounds had 
to be recalled in full. Abbreviations involving no change in pronunciation were scored as 
verbatim, but contracted forms, or expanded forms of original contractions were not. Isolated 
words or particles still counted as verbatim if the position or context was similar to the original. 
Transposition was no disqualification, and for a word present only once originally, occurrence 
almost anywhere in a reproduction was counted. 

(iv) Intrusions and duplications, I: Intrusions were scored by counting all the words of a 
script not corresponding to or deriving from material in the original passage. Duplications were 
scored as intrusions because they were a very small group, and mostly could not be 
distinguished from intrusions. In general, that repetition furthest from the original location, or 
which occurred second, was counted as intrusive. Isolated or trivial words contained in a 
basically intrusive phrase were often scored otherwise. Such words in other contexts could be 
scored as intrusive, unless they were part of a change in expression; for this reason isolated 
conjunctions were rarely scored intrusive. 

(v) Non-verbatim recall, X: Non-verbatim recall was usually calculated by subtracting V and I 
from W, but defining comments help in borderline cases. It was taken to include many different 
things: any departure from verbatim recall where pronunciation was affected; more or less 
synonymous words and phrases; substitutions of related or derivative meaning, even remote or 
antonymous; substitutions of a word or phrase for a pronoun (or vice versa), provided a 
minimum number of words was used, substitution of a pronoun for a pronoun implied but not 
stated originally; words involved in changes of expression of other material, but not in 
themselves substitutions. As with verbatim recall, considerable transposition of infrequent 
material could be tolerated. 7 

(vi) General considerations: A general principle of conservation operates in borderline cases, 
according benefit of doubt to the more ‘accurate’ possibility. In the marking study judges 
showed rather a high degree of inconsistency and many errors. There was also a reluctance to 
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score positively (i.e. as V or I) isolated words or particles. These tendencies should be avoided 
to achieve reproducibility. 

W was used as one of two measures of ‘quantity of recall’. It does, however, include the 
intrusive component, I, defined as material not obviously recalled from the original passage. It 
might be argued that W-I, i.e. V+X, constitutes a better estimate. This study prefers to use W 
since to exclude I, small and unremarkable as it is, is tantamount to making a priori assumptions 
about the nature of recall. Intrusions might, for instance, be found later to represent original 
material in an unexpected form. In any case, its inclusion made little difference to correlations 
with the other quantity measure, CL (see Results section). 


Clause scores 


The other scoring unit provided an alternative and more or less independent estimate of 
‘quantity of recall’ by which to assess the usefulness of W. Each clause also represented a 
discrete piece of action or description, convenient for later qualitative analysis. A ‘clause’ was 
defined as any verb, whether finite or not, together with its associated parts of speech. Auxiliary 
verbs, broadly defined, were combined with the main verb. This produces divisions of equal 
grammatical status (unlike Levitt’s, 1956, suggestion of ‘idea groups’), and avoids the biasing 
effects of the favoured recall of different parts of speech (see Gomulicki, 1956; King & Cofer, 
1960; Wearing, 1973). 

Only the original passages were divided into clauses (see Appendix), not subjects’ 
reproductions. Such reproductions were, however, compared with each original clause in turn, 
and that clause was held to be ‘represented’ by the subject if any part of it (other than trivial 
words such as particles or tense-forming auxiliary verbs) was identifiably retained, however 
much altered. ‘CL’ was the total number of such clauses for each script. In borderline cases, it 
was concluded that a clause was represented. 

A disadvantage of clauses is their variable word length, which was necessary to avoid an 
unnatural style in the passages. It is alleviated somewhat by scoring for ‘representation’, and a 
preliminary analysis showed no relation between clause length and frequency of representation. 


Neither quantitative measure makes prior selection of important or ‘key’ points from the 
passages, unlike Cofer (1941) or Howe (1970) for instance. All such a priori analytic constraints 
inevitably limit the conclusions we can draw from results, especially if the psychological 
processes underlying memory are selective too. 


Results 


Means for the five variables, by passage, order and subjects, are summarized in Table 1, which 
includes Anova results and sex differences. The most important point arising is that V 
constitutes both the major share of W and of the variance in W. Indeed, the largest single source 
of variance in the whole study is the variance in V due to differences among subjects. Extreme 
values for the percentage of the original 225 words recalled verbatim were: passage means, 35-1 
and 44-4 per cent; position means, 31-1 and 45-0 per cent; subject means, 22-1 and 59-3 per cent; 
and for individual scripts, 10-1 and 68-9 per cent. 

In the analysis of variance, all factors produced significant effects on the ‘quantity’ measures, 
W and CL, and on V. There were significant effects on X and I from passages and subjects, but 
not from order, a difference that was quite marked. V, X and I, therefore, all appear to be 
components of the differences in recall ability or style between subjects, and of the differences 
in recall produced by different passages, whereas familiarity with the experimental task, 
including a possible ‘fatigue’ element, affects quantity of recall only, through the verbatim 
component. 

Sex differences on all variables were small or zero. There was a suggestion that females were 
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Table 1. Summary statistics and Anova results for all variables 


Statistic 


Overall data, n= 162 ie 


Males 
Females 


Passage means, n=9 
$.D.8 on 


Means for sexes { 


Order means, n=9 
Subject means, n= 18 


sagest 
Anova results, Fs due to; Ordert 
Subjectst 


t d.f. =8, 128; £ d.f.=17, 128. 


200 


Recall means (words) 


* P<0-01; 


w V 
161-5 91-5 
34-4 28-8 
160-8 89-2 
162:3 93-8 
88 73 
10-3 93 
26:3 23-2 
3.6444 5.964 
4-79%* 878** 
13-928*  27.98** 
* P< 0-001. 
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Figure 1. Variation of recall components and total recall associated with order of presentation. 
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Passage 


Figure 2. Variation of recall components and total recall associated with the passages ordered by ascending 
values of W. 


at an advantage on verbatim recall, and males on the non-verbatim component, but f tests on the 
differences, using $.D.s on subjects’ individual means (d.f. = 16) gave P> 0-20 (two-tailed) for 
each variable. 

Figure 1 shows the variations in the true variables over successive presentations, and Figs 2 
and 3 show such variations when individual passages and subjects respectively are arranged in 
rank order of W. Figure 3 contains more irregularities, partly because subjects are more variable 
and partly because points are the means of nine rather than eighteen items. Figures | and 2 are 
really two ways of showing the differences among the recall attempts of an ‘average’ subject. It 
can be seen that W and V tend to follow the same courses, whereas X and I more or less 
fluctuate around their means. There are a few exceptions, however. In Fig. 1, the deviations 
from trend in X seem to parallel slightly those in W and V. In Fig. 2, two of the passages are 
markedly peculiar, and in Fig. 3, X begins to decrease for the smallest values of W, two 
variations which probably account for the effects on X and I in Table 1. Without them, only 
verbatim recall would vary in response to any of the three factors in the experiment. Apparent 
vague relations between V and X or I, especially in Fig. 3, are probably an artifact of ordering 
scores by W, where W = V+X+I. 
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Figure 3. Variation of recall components and total recall associated with subjects ordered by ascending 
values of W. 


In passing, it can be noted that differences in recall of passages associated with a priori 
structure were small. In nodal passages, there was a little less variation among the recall ability 
of clauses, and the favoured recall of material, at the beginnings and ends (which gives a a sort of 

‘serial position’ curve) was less marked than in linear and branching passages. 

Intercorrelations among W and its components are shown in Table 2. Those with W are trivial 
in view of the relative contributions of V, X and I to W. More importantly, there seems to be 
little relation between the size of the verbatim and non-verbatim recall components in subjects’ 
reproductions, supporting or reflecting Figs 1-3. There is a small but significant correlation 
between X and I. 

Table 3 contains intercorrelations among the three ‘quantity of recall’ measures. The 
near-perfect correlations between W and V +X, and their almost identical correlations with CL, 
provide additional justification for using W instead of V +X as the main ‘quantity ' variable. 


Discussion 
The verbatim recall component 


The most important findings from the present study concern the verbatim recall component. As a 
proportion of the original, V is roughly consistent with previous studies, although the scoring 
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Table 2. Spearman rank correlations among W and its components 

a UO 
wW Vv X I 

e a a a ee oe 

V +0-86** 

X +0-55** +0-13 

I +0-26** ~0-10 +0-19* 

ee ee 

** P<0-001; * P< 0-05. 

n= 162, two-tailed tests. 


Table 3. Spearman rank correlations among ‘quantity of recall’ measures 


W V+X CL 
V+X +0-97 
CL +0-82 +0-83 


P €0-001 in all cases, n = 162, two-tailed tests. 


criteria used are probably more exhaustive than usual. It averaged 31 per cent on trial 1 of the 
first session, against 20-40 per cent for passages about two-thirds the length (Cofer, 1941; Kay, 
1955; Howe, 1970). A surprising figure of 69 per cent was achieved for the best single recall 
attempt, but that was not an isolated figure. Thus, as in other studies, words, as well as meaning, 
are fairly well retained, contradicting Herriot (1974, p. 73). 

Apart from its size, which consistently averages greater than I or X, verbatim recall is also 
important as the main source of variance in the composition of W. With a few exceptions, it is 
the only component varying systematically with total recall. Both inter- and intra-individual 
differences seem to act only on the material reproduced verbatim. Non-verbatim recall and 
intrusions seem more or less immune to such effects. This differential behaviour seems to make 
real differences among underlying processes a more viable proposition than Cofer (1973) would 
imply. 

One serious possibility which could affect the results is that observed verbatim recall has two 
components: one corresponding to the genuine recall of the original words, the other being 
essentially non-verbatim recall which has been put back into the original words by chance. 
Estimating the size of such a ‘quasiverbatim’ component is likely to be difficult, but important. 
Quasiverbatim material should really be treated as non-verbatim, and Figs 1-3 redrawn 
accordingly, but there is no obvious reason why it should not behave like other non-verbatim 
material. 


The units of analysis 


The correlations among W, V+X and CL give added support to the use of W instead of V+X. 

- The high correlation between W and CL is pleasing considering their differences, and tends to 
validate both as quantity measures. It seems desirable that any such unit should yield divisions 
of equal status and size. The clause has been preferred to Levitt’s (1956) ‘idea groups’ for 
providing the former, but its divisions are of less equal size. There is some extra advantage from 
using both clauses and words, however. Words are a much less crude unit than clauses, but 
some words are more meaningful in isolation than others, and a single word can often substitute 
for a whole phrase and vice versa. Many may be omitted from prose with little change in 
meaning. These factors are strongly dependent on context, which intruduces difficulties into 
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using the presence or absence of words to measure ‘quantity of recall’. Perhaps such a measure 
becomes meaningless if pursued too far, anyway. 

Recall under present circumstances is fairly literal, so that this sort of item-by-item analysis is 
more appropriate than if recall were made more difficult. Furthermore, differences among scripts 
in conciseness of expression looked, superficially, to be small. Taken with the clear results 
obtained, the scheme of analysis used here has been a moderate success. Eventually, however, 
less straightforward analysis will be necessary. The raw material is more than just a sequence of 
elements: there is recurrence of items, a host of implicit interrelations, and many higher order 
components of organization, and while the act of recall itself may to a large extent be sequential. 
it cannot be expected to correspond very well to the sequentiality of either the original passages 
or subjects’ réproductions. Nor can the same material be expected to mean the same for 
different subjects (see Bartlett, 1932). The only solution would seem to be through a fuller and 
more realistic qualitative examination of subjects’ scripts, and of other accessible processes 
during recall. 


Subjects’ interpretation of the task 


The use of terms like ‘error’ or ‘accuracy’ always begs the question of what subjects were 
trying to do during recall, by a comparison, albeit implicit, with some ideal of performance. This 
is clearly unrealistic, but it does lead us to question how subjects interpret such tasks. Care was 
taken in the present experiment to control the setting and instructions so as to dissuade subjects 
from interpreting the task too rigidly, and to ensure a certain uniformity of interpretation. All 
subjects seemed satisfied with the instructions, yet no checks on their agreement or consistency 
were made. One risk of ‘middle-of-the-road’ instructions is that they might inadvertently permit 
both of the extremes they were designed to avoid, and a special investigation to clarify this 
would be desirable. ; 


Conclusions 


The major problem still remains, of explaining why the relations among the observed recall 
components are more or less the same whether they refer to within- or between-subject 
differences. It seems that the psychological differences produced in a single individual by 
experience, fatigue or change of passage are of the same kind, and that these differences are the 
same as those found between individuals. Many reasons may be suggested, but the results here 
can go little way towards determining the relative contributions of psychological and 
methodological factors. The size of these trends indicates, nevertheless, their importance for 
future study. It is hoped that some of the problems of materials and analysis have been eased by 
this paper. 
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Appendix: The nine passages; oblique lines mark ends of clauses 


IA 


One day Ernu decided to hunt the giant armadillo. / He went to his grandfather first / and borrowed some of 
his poison arrows, / then visited the village shrine / and prayed to his tribe's ancestral spirits. / After this he 
walked deep into the forest, / where he slept the night on some dry leaves in a cave. / Early in the morning 
he was wakened by a noise / and crept out of the cave into the moonlight. / At first he could see nothing 
except the misty river banks, / but eventually noticed a humped shape some way off. / Suddenly the shape 
vanished into the forest. / Ernu ran after it. / He plunged into the undergrowth, bow and arrows in hand. / He 
followed the animal’s tracks for over half an hour, / until he came out into a swampy clearing. / He looked 
around for a while / before espying a shadowy depression in the undergrowth: / quickly he fired several 
arrows into it. / There was a loud roar. / He ran over / and found the fabulous giant armandillo, / but it was 
already quite dead. / Ernu jumped among the bushes / to skin the monster of its tough, legendary hide. / 
Then he had to drag the bulk back through the forest, / and after many hours reached his tribe’s village. / He 
showed the hide to his grandfather, / who was so proud / that he gave Ernu a fine timber hut. / 


IB 


When Trevor’s grandmother died, / she left a long and complicated will. / Three lawyers had to decipher it 
for a month / before concluding that, amongst other things, / Trevor had been left his grandmother’s 
favourite cockatoo. / He took it back to his bed-sit / and placed its cage in the window, / where it sang all 
day and most of the night. / After a week this began to strain Trevor's nerves rather badly, / but after a 
fortnight he could stand it no longer. / At tea one evening he suddenly jumped out of his chair / and dashed 
upstairs. / He returned with an old, voluminous suitcase, / into which he stuffed the cage with the poor 
cockatoo in it. / That night he put on an old raincoat, / stole quietly out of the dark boarding-house, / and 
made for the nearby cemetery. / He quickly found the recently dug grave by torchlight, / and dropped the 
suitcase by it. / From under his coat he brought out a spade / and frantically began shovelling earth from the 
grave, / until the spade struck the wood of a coffin. / They he threw his spade down on to the ground, / 
climbed out of the hole, / and tossed the suitcase to the bottom. / After hastily filling it ın, / he heaved a sigh 
of relief / and walked thankfully home. / Immediately he went to bed, / and that night slept like a log. / 


IC a 


One evening I went to a dull party / and I met Mr. Angschmidt, manager of Mechanical Contraptions Ltd. / 
The following week he invited me to his factory / and showed me his latest production line. / It began in a 
dim workshop, / where a steel plate was pressed into severa] carved pieces. / Workmen smoothed off the 
rough edges / before sending them to a second workshop. / There, a man in white overalls polished the pieces / 
and washed them with a special solution. / When they had dried, / he painted them with a tough enamel / 
and passed them carefully to his fnend on the next bench. / This man took a frame of copper struts / and 
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carefully attached the steel plates. / This produced a shiny cylinder, / and a boy took it into the electrical 
laboratory. / One technician fitted it with an electric motor / and then clipped a fan to the end of the motor. / 
Somebody else soldered wires on, / drew them through a hole, / and plugged them into a socket. / The 

machine was tested / before being carried to a large assembly room. / A woman bolted a cover over the 

base, / attached rubber wheels, / and clipped a bag over the back, / and put the completed product in a 
cardboard box. / A machine stamped ‘handle with care’ and a picture on the box: / only then did I recognize ` 
it as a vacuum cleaner. / ; 
2A 
Many years ago the Parali people stopped wandering over the hills, / and settled in grass huts / that lay by 

the loop of a river, / and which had been built on stilts / to protect them from periodic flooding. / They had 
lived here happily for many years, / fishing in the placid waters nearby, / but their contentment was disturbed 
one day / when someone pointed out / that every time the river overflowed / it weakened the precarious 
bamboo stilts under the huts / and washed away some of the soil. / The chief of the tribe too began to 

worry: / he was very happy there / and didn’t want to move. / He called a gathering of all the men / to 

discover urgently / how many of them thought / that the erosion of the soil had become such a danger / that 
their huts might any day tumble into the river. / The men decided to evacuate the village at once. / They 
gathered their families and goods from their dwellings, / to be loaded on to wooden carts / which had been 

idle since the nomadic days long ago. / Finally, when the village was empty of people and possessions, / the 
medicine man chanted a long sad song / and set fire to the grass roofs, / while his son beat furiously on the 
drums. / Then the Parali and their belongings moved slowly off into the forest, / to become nomads again / 


2B è 

Mrs Taylor had taken her two children to a toyshop, / so she could find out / what they wanted for Christmas 
presents. / They entered the shop through glass doors / and soon stood in front of a large display of toy 
soldiers. / Some of them had been stood alone on shelves, / others engaged in mortal combat, / raising 
bayoneted rifles high above their heads / as if to pierce each other through the heart. / Mrs Taylor moved 
on, / though her children didn't want to. / Then they found the electric trains / -a huge tatle was given 
over to them, / where they purred round and round all day, / some pulling passenger carriages between 
miniature stations, / others shunting wagons between various sidings. / But the children had no wish to 
watch trains, / and pulled their mother over to another stand / where a toy spaceship emitted lights and 
noises, / and some other small machine ground over an imitation lunar landscape. / Mrs. Taylor 

waited patiently / while son and daughter ran from one display to another / just to see how the marvels 
of the second / exceeded those of the first. / Sadly she realized that thing / which they in their delight 
had forgotten all about. / They would have to do without eapensive presents this Christmas, / now that 
their father had died, / leaving them with no means of support, / and making their home very quiet and 
lonely. / 


2C 


In trying to shave one morning, / which is always a dismal prospect before breakfast, / I found to my 
surprise, / on switching on, / that the motor made a most disturbing grating sound / which alarmed me at 
first. / Indeed, I had never heard its like before. / I took the back of the razor off / to look inside for anything 
amiss, / when a dozen tiny curlicules of metal fell out / and disappeared into the carpet. / I then showed the 
razor to a friend, / who knew a lot about such matters, / or so he let others believe. / He said he didn’t like 
the look of the steel fragments, / and then he took the back off, / whereupon some pieces of charred plastic 
rattled to the floor, / alarming me even more, / because there couldn't have been much left inside by then. / 
But my friend placed the razor on the table, / where the sunlight glistened on the rust. / He gave me a few 
words of advice: / I should have found out long ago / how to use an electric razor, / and how to manage 
without the soap and razor blades / which had had such a deleterious effect. / 1 walked home disheartened / 
and went to have a shave in the bathroom, / getting out an old cut-throat with my left hand, / and with my 
right tossing the battery razor through the window. / 
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3A 


In the beginning the Thunder God created an island in the sea. / His three sons lived in its mountains: / the 
Rain God sulked in his mass of clouds, / the Fire God sat in the summit of a volcano / and the Stone God 
rumbled in a ravine. / The foothills were covered with forests / where many serpents lurked, / thinking evil 
thoughts. / Unicorns appeared on the plains, / and ran in swift herds between the river and the forest. / At 
the river they drank the deep, cool water, / and in the forest they ate roots and wild berries / which sprang 
like magic from the dank undergrowth. / In a cave by the sea lived a dragon / who came out once a year / to 
hunt for a mate / and to chase the unicorns and serpents. / Men too were created. / They built themselves a 
village of log huts, / and set up a council / which consisted of the oldest and wisest, / to commit to writing 
the first laws. / A single river descended from the mountain-slopes / and ran through forests and plains, / to 
merge with the sea beyond the island’s cliffs. / The sky above was often the clearest blue, / but sometimes 
filled with storm-clouds / and at others the black specks of birds could be seen / calling to each other over the 
sea. / A thousand years hence the {sland will be entirely destroyed. / 


3B 


It was indeed a beautiful house. / The decorators had tried their best with the decor; / each room represented 
a different period: / one saw Classical, Georgian and ultramodern rooms immediately adjacent. / The 
plumbers had installed a solid silver bath / and connected it to unbelievably quiet water-piping, / hidden from 
sight, / which was to win an important industrial award. / Glittering taps projected from the foot of the bath. / 
The kitchen had been uniquely fitted out: / one wall housed a deep-freeze the size of a small room, / and 

the floor was supposedly self-cleaning. / The builders had taken trouble / to enhance the walls / by fusing 
their surfaces with oxyacetylene torches / so that they acquired a glass-like finish, / and by using blue-tinted 
concrete. / Heating was provided by large ceiling panels / which were no fire-hazard / due to their low 
temperature. / The architects had chosen the site of the house, / and had positioned it carefully in relation 
to the terrain, / so that it nestled in its landscaping / as a chick snuggles in a hen’s nest. / The site also 
provided the maximum protection from the elements. / The nurserymen had been hired from a botanical 
gardens, / and they planted many shrubs, / distributing them in clusters / so as to lend an almost subtropical 
air to the setting. / Both bride and groom were overjoyed with their new home. / 


3C 


Tempotranspo’s time-machine has been designed with great attention to detail. / The operator climbs in 
through a forward hatch / and sits on a plush, ventilated seat. / His feet rest on pedals on the floor: / the left 
one can be used as an emergency time-brake, / whereas the right one dissociates the machine from the 
present, / Passengers climb in through the rear hatch / and sit on equally luxurious seats. / These tip back / if 
the occupant wishes to sleep. / The time engine itself is located in the middle of the machine / and draws its 
power from batteries / located in the lower bodywork, / which may be recharged occasionally from the 
mains. / The outside is moulded from a special laminated plastic / and can resist extremes of cold and heat / 
without becoming brittle or tarnished. / In front of the pilot is housed the computer, / specially designed by 
Plessey. / It can control the time-machine quite automatically / which relieves the pilot of many 
responsibilities / and controls travel more accurately. / The superstructure is of an aluminium alloy / and was 
constructed by Hawker Siddeley. / It is welded to the body-work and leg-struts. / Operation is quite simple / 
and is described in a detailed handbook. / Alternatively, the intending purchaser may attend a course of 
lessons / at the end of which he sits for a diploma. / Tempotranspo expect an expanding market for their 
time-machine. / 


9-2 
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Level I and Level II abilities: Some theoretical reinterpretations 


Ronald F. Jarman 





Some of the major assumptions and premises of the theory of Level I and Level TI abilities are examined 
using a recent model of cognitive abilities known as simultaneous and successive syntheses. It is found that 
the empirical evidence for Level I and Level II abilities can be interpreted parsimoniously using the 
simultaneous—successive model. The result of these reinterpretations is both increased clarification of some 
previously vague aspects of the Level I~Level II theory, and greater understanding of the relationship of the 
theory to some current themes in cognitive psychology. 


There have been few theories in the psychology of individual differences that have attracted 
attention equal to that paid Arthur Jensen’s model of Level I and Level II cognitive abilities. 
With respect to the implications of the model for public policy, Cronbach (1975) suggests that the 
controversy surrounding the Level I-II theory may be unparalleled in the history of mental 
testing. 

Despite the number of reactions surrounding Arthur Jensen’s views however, relatively few of 
these have addressed the basic premises incorporated in the theory in the light of other current 
theories of cognitive abilities. The Level I-II theory was first proposed by Jensen in a National 
Academy of Sciences address (Jensen, 1968), later expanded drawing upon results from Project 
Headstart (Jensen, 1969), and then contrasted with some other models of cognitive abilities 
(Jensen, 1970.a). Several of the solicited reactions to the Harvard Educational Review paper 
touched on substantive issues (e.g. Cronbach, 1969; Elkind, 1969; McV. Hunt, 1969) and some 
other partial analyses have appeared subsequently (e.g. Rohwer, 1971), but the major reactions 
to the theory have been sociopolitical, genetic and educational in substance. Further, during this 
period and the subsequent time, the theory itself changed little. From 1970 forward, the bulk of 
Arthur Jensen’s research has been concerned with determining population parameters using the 
model in essentially its original form (Jensen, 1970b, 1973, b, 1974, 1975; Jensen & 
Frederiksen, 1973; Jensen & Figueroa, 1975). Other researchers appear to have either ignored 
the theory completely (for an interesting survey in this regard see the papers in Riegel, 1973) or 
observed in passing that their results in a study do not mesh with one or more of its premises. 
An effort to interpret the considerable empirical information that Jensen has amassed and to 
integrate it into the present literature on cognitive abilities has not been attempted to date. 

The purpose of this paper is to reinterpret Jensen’s theory by examining its essential 
assumptions. A model of cognitive abilities recently proposed by Das, Kirby & Jarman (1975) 
will be used for this purpose in addition to related literature. The rudiments of both theories will 
be stated first, followed by a discussion of four broad areas of reinterpretation. These areas 
comprise: quantity versus type of information processing, internal relationships between 
abilities, the utility of viewing abilities within a framework of cognitive strategies, and the 
relative discriminating powers and developmental differentiation of Levels I-I. 


Theoretical positions 
The most extensive and lucid description of the theory of Level I and Level II abilities is in 
Jensen (1970.a). The distinction between the two types is described as: 


Level I ability is essentially the capacity to receive or register stimuli, to store them, and later to recognize 
or recall the material with a high degree of fidelity. . .. It is characterized by the lack of any need of 
elaboration, transformation or manipulation of the input in order to arrive at the output... Level I ability 

. . -is characterized by transformation and manipulation of the stimulus prior to making the response (Jensen, 
1970 a, pp. 155-156). 
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These two types of abilities form the poles of a continuum along which a given cognitive task 
may be placed. Level I ability is synonymous with memory and Level II ability is reasoning and 
problem solving. The meaning of this continuum then, derives from the amount of information 
transformation that is required in these two abilities. The nature of what is transformed is 
handled in the theory by a second orthogonal dimension representing cultural content with 
‘culture-free’ and ‘culture-loaded’ as the dimension poles. The latter dimension is suggested 

by Jensen (1970 a, 1973 b) to correspond to Cattell’s (1971) fluid and crystallized abilities 
respectively. 

An alternative theory of mental abilities to those presently available has been proposed by Das 
et al. (1975), and is known as simultaneous and successive syntheses. Historically, this theory 
was derived in part from research results which were inconsistent with the Level I-II model 
(Das, 1972; cf. Das, 1973, b). The major impetus for the development of the theory, however, 
was its broad appeal in terms of current research in serial and parallel processes (Neisser, 1967), 
language (Kirby, Jarman & Das, 1976), and the major themes in cognitive psychology 
summarized by Ornstein (1972) regarding the roles of the hemispheres in the brain (cf. 
Gazzaniga, 1970; Levy, 1974). 

The theory of simultaneous and successive syntheses draws heavily from Luria’s (1966, b, 
1973 a) Russian clinical research. Summarily, the theory posits that human information 
processing may be described in terms of a model containing four components: external input, 
sensory registration, central processing, and output. Stimuli may be presented for external input 
in either a simultaneous or successive manner. The stimuli are immediately subject to sensory 
registration and dependent upon the nature of the task, may be passed on for central processing. 
The processing in the central unit may take one of two fundamental forms: simultaneous 
synthesis or successive synthesis. Simultaneous synthesis refers to the organization of 
information into composites, such that the relationship of elements to one another can be 
determined. This organization may be spatial, for example, or it may be represented in speech in 
complex logical-grammatical structures. In contrast, successive synthesis is a form of 
information organization which does not allow analysis of the relationship of multiple elements 
to one another. Rather, information is organized in a temporal, sequence-dependent form. 
Simultaneous and successive syntheses are combined with a planning and decision-making 
component in the central processing unit with reciprocal relationships between them. Planning 
and decision making is dependent upon the two forms of synthesis, and also determines the form 
of synthesis for some tasks. Finally, the output unit uses the information organized by the 
central processing unit for task completion. 

The relationships between the units in the model are proposed as independent, with changes in 
the form of information organization in the units according to task demands. Information may be 
presented successively, but processed simultaneously. Hearnshaw (1956), for example, describes 
a series of tasks which demand simultaneous organization of discrete elements presented over 
time. Conversely, information may be presented simultaneously, and processed successively, as 
in the weli-known dichotic listening paradigm (e.g. Kimura, 1973; Schulhoff & Goodglass, 1969). 

The emphasis, therefore, in the theory of simultaneous and successive syntheses is placed on 
the forms of cognitive representation used in task performance. The type of information 
processed (Cattell, 1971) and the amount of information transformation (Jensen, 1970 a) will vary 
in each of the forms of synthesis. Thus, the theory attempts to come to grips with the processes 
underlying cognitive ability measures and, in this sense at least, is consistent with other recent 
efforts (e.g. Messick, 1972; Estes, 1974; Carroll, 1976). 


Quantity of transformation vs. types of processing 
Some of the basic differences between the theory of Level I and Level II abilities and 
simultaneous and successive syntheses are due to differing emphases on amount of information 
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transformation during task performance as opposed to types of transformation or cognitive 
processes. f 

Considering Level I abilities first, tasks which tap these abilities most heavily are said to 
involve little or no information transformation. An example of these tasks given by Jensen (1969, 
1970 a) is the well-known digit-span test. Other tests of Level I include primarily associative 
tasks such as serial learning and rote learning of paired associates. 

This conception of human memory is inconsistent with both the Das et al. (1975) model, and 
other memory research in at least four respects. First, as noted by Lawson & Jarman (1977), to 
distinguish memory ability from other abilities by an increasing lack of transformation as tasks 
become pure measures of the former, is incongruous with other current memory models. A 
major thrust of memory studies today is to explicate the nature of the heuristics, transformations 
and strategies used by subjects in memory tasks. This is especially the case, of course, in the 
more complex tasks such as serial and free recall, which are the types used by Jensen to 
distinguish Level I from Level II. In terms of theoretical stance, even the most associationist 
points of view are addressing this problem (e.g. Voss, 1972). 

Within the Das et al. (1975) model, memory processes are comprised of two basic varieties, 
both simultaneous and successive. This is explicit in operational terms through their data which 
indicate that successive processes in memory are tapped by tests such as digit span, serial 
recall, recall of abstract paired associates and visual short-term memory. Simultaneous forms of 
memory are tapped by tests like the Memory for Designs test (Graham & Kendall, 1960) and 
recall of concrete paired associates. The Level I-II model groups these tests together as 
measures of Level I, thus overlooking the Das et al. results which indicate that they are of two 
different varieties. The simultaneous-successive model allows for different quantities of 
processing within each type and between the types, and is supported by current research on 
differential processes in memory, such as Paivio’s dual coding hypothesis Kirby & Das, 1976; 
Paivio, 1975, 1976). Also, these results obtained by Das and his colleagues are consistent with 
Luria’s (19664, b, 1973 a) discussion of mnestic processes. 

A second point regarding the definition of Level I or memory as lack of information 
transformation, is that this definition creates some internal inconsistencies in the theory itself. In 
several recent publications Jensen has tested his original hypothesis that some subjects may use 
Level II abilities to perform memory tasks. For example, Jensen & Federiksen (1973) assumed 
that SES differences in categorized lists in free recall were due to the use of Level II abilities, 
and that smaller differences in uncategorized lists were due to comparable Level I abilities in the 
SES groups. Similarly, Jensen & Figueroa (1975) argued that Level II is the predominant ability 
used in backward digit span, and Level I is used in forward digit span. 

The evidence indicating that memory tasks are performed using various cognitive abilities 
raises questions regarding the validity of labelling the Level I position on the continuum as 
memory. Specifically, the memory-reasoning distinction becomes questionable when it is 
recognized that memory tasks may involve Level II or reasoning abilities. This difficulty is 
mitigated somewhat by Jensen (1970.4) in his assignment of the alternate label ‘Conceptual 
Learning’ to the Level IT pole. In other publications however, the problem is evident (e.g. 
Jensen, 1973 b). 

In the model of simultaneous and successive syntheses, recognition of the effects of task 
demands in evoking different forms of memory organization is explicit. It is the forms of 
organization that create a continuum, not whether the task represents associative or conceptual 
learning. In the case of backward and forward digit span, for example, Das & Molloy (1975) 
demonstrated that forward digit span is performed by successive synthesis, as would be 
expected because retention of serial order is important (see Gianutsos, in press, for a discussion). 
Backward digit span loaded on a factor that they designated as ‘Spatial Imagery’, which 
appeared to bear a relationship to simultaneous synthesis. 
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A third point with regard to Level I deals with the role of control processes in memory. In 
some respects. Jensen appears to view abilities as rather immutable and tied directly to tasks. In 
other respects however, he appears to include the role of executive or control processes through 
his acknowledgement that some tasks may be performed via different abilities by different 
individuals or groups: ‘Some tasks lend themselves to being learned on an associative level or on 
a conceptual level, and different learners may prefer one or the other approach, so that in one 
population a test may stand at a different point on the complexity continuum than in another 
population’ (Jensen, 1970a, p. 155). 

In this regard also, the simultaneous-successive model appears to be more consistent with the 
current emphasis on subject-initiated strategies and control processes in memory (see Brown, 
1974, 1975; cf. Hagen, Jongeward & Kail, 1975). The model proposed by Das et al. includes a 
component for planning and decision making which is based on Luria’s observations that this 
function is cortically distinct from the two forms of synthesis. Indeed, the role of this function 
in memory is only part of its overall significance in cognitive psychology (Anderson, 1975). 

A fourth and final point regarding Level I in its distinction from Level II is that increasingly 
there appears to be a belief that it is dysfunctional to attempt to consider memory as divorced 
from other cognitive processes (Jenkins, 1974; Norman, 1973; Piaget & Inhelder, 1973; Brewer, 
1974). The essence of this concern appears to be that memory processes interplay with so many 
other aspects of cognition, that it is artificial to set them aside as a discrete unit for analysis. 

The simultaneous-successive model also appears more appropriate in this respect. Memory 
processes interact with perceptual and conceptual processes, and it is the forms of synthesis that 
are of primary importance, not the division into specific abilities. An examination of the 
simultaneous—successive factors reported by Das et al. (1975) and Jarman & Das (1977), 
reveals that elements of mnestic, perceptual and conceptual abilities are represented together 
on the factors. For example, a memory test, Memory for Designs, unites consistently with two 
tests that do not include memory, Figure Copying and Raven’s Progressive Matrices, to denote 
the simultaneous factor. This combination of tests likely includes aspects of all three of mnestic, 
perceptual and conceptual processes. 

Difficulties in the Levels theory are not confined to the four points noted above concerning 
Level I. The definition of Level II ability can also be seen as open to question, largely as a 
derivative of its definition by the continuum of quantity of transformation. 

Pure Level II tasks ostensibly demand information transformation to a high degree. The tests 
that have been used frequently by Jensen to measure Level II are Raven’s Progressive Matrices 
and Figure Copying. Research on the nature of Level II as measured by these tests demonstrates 
some inconsistencies with an interpretation of performance using transformation quantity to 
define the cognitive processes tapped by the tests. While the majority of Jensen’s work has used 
analysis of variance, a factor analytic study was reported recently (Jensen, 1973 b). 
Approximately 2000 subjects of White, Negro and Mexican background were sampled in grades 
4, 5 and 6. The ethnic groups were combined and the results factor analysed for each grade. A 
factor defined by the non-verbal section of the Lorge-Thorndike, Raven’s Progressive Matrices 
and Figure Copying was designated as Level II. This identical factor has been found by Das et 
al. (1975) with the addition of several memory tests, most notably Memory for Designs, and 
designated as simultaneous synthesis. 

The Das et al. results indicate then, that the correlative relationships between Raven's 
Progressive Matrices and Figure Copying lie more in the simultaneous organization of 
information than in quantity of transformation or memory for information. Further, this 
relationship need not be entirely non-verbal and spatial. Bock (1973), for example, has argued that 
grouping relationships is the predominant source of individual differences in Raven’s Progressive 
Matrices and the test contains verbal components also (cf. Burke & Bingham, 1969). Luria 
(1973 a) views simultaneous synthesis as partly spatial, but also includes elements of language 
and logic, an interpretation which is quite consistent with Bock’s (1973) discussion. 
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Functional dependence vs. independence 


A basic premise in the Level I-Level II theory is that there is a functional relationship between 
the two abilities. In the initial formulation of the theory, Jensen (1969, 1970 a) proposed that 
Level I ability is a necessary but not sufficient condition for Level II ability. The two abilities 
form a hierarchy which is similar to White’s (1965) proposal that mental organization is 
composed of an associative layer and a second higher cognitive level, with the latter 
superimposed on the former. 

Jensen derived the functional hypothesis in part, from the argument that memory is necessary 
for reasoning, but not vice versa. He suggested that a certain degree of short-term memory ability 
is necessary in order to retain the elements of a reasoning problem while the solution is being 
derived, Correlations between tasks representative of the two types therefore, were 
hypothesized as at least moderate, in order to reflect this relationship. 

The hypothesis of functional dependence has not been supported by studies conducted since 
the formulation of the theory. As early as 1971, Rohwer suggested that the evidence in this 
regard was mixed, at best. In later research, Jensen (1974) concluded that the functional 
relationship was not as strong as first suggested: ‘.. .there does not appear to be evidence of 
any strong degree of functional dependence between the abilities; quite low or high scores on the 
one ability are not incompatible with a high or low score on the other, though there is a 
tendency for low intelligence-high memory to be more frequent than the opposite combination 
of abilities, especially for nonverbal intelligence’ (p. 111). Finally, Horn (1976) reviewed 
unbiased data concerning the assumption of functional dependence of Level IT on Level I, and 
concluded that there is no support for this assumption. 

Although it has not been explicitly stated by Jensen, it is evident that the tasks used most 
heavily in his research would yield a conservative test of the functional hypothesis. The Level H 
tests, Raven’s Progressive Matrices and Figure Copying, both place no memory demands upon 
the subjects. In both tests all information is available visually throughout completion of the task. 

The question may be raised therefore, whether a functional relationship could be demonstrated 
using a reasoning task which has memory components. A well-known test of this type is the 
three-termed syllogisms problem. This task involves serial ordering of three terms (X, Y, Z) 
given two statements about their relative position to one another on some dimension (e.g. X is 
larger than Y; Z is smaller than Y; which is largest?). The syllogisms paradigm has attracted a 
good deal of attention regarding the cognitive processes involved in the task (Clark, 1969.a, b, 
1971, 1972; Huttenlocher, 1968; Jones, 1970; Huttenlocher & Higgins, 1972). 

Cummins (1973) found that correlations between this task and free recall tasks are 
insignificant. A factor analysis of these tests with other memory and reasoning tests (see Das et 
al. 1975, pp. 95-96) yielded two factors which Cummins found were uncorrelated and hence best 
rotated by varimax. Cummins’s interpretation of the two factors was that they represented 
simultaneous and successive syntheses. Interestingly, he found that for both free recall and 
paired associates, concrete word tests tended to correlate more highly with the syllogisms test 
than versions of these tests which used abstract words. The significance of these data is that 
correlations between the memory and reasoning tests were due more to the common 
spatial-imagery processes incorporated in the tasks than a general dependence of reasoning upon 
memory. 

It appears, therefore, that when tasks are classified on the dimension of Level I-I as 
suggested by Jensen, there is little or no functional dependence between them. Correlations 
between some memory and reasoning tasks are due to common processes used, not to a 
functional dependence. Conversely, when a set of tasks which demand simultaneous integration, 
such as Raven’s Progressive Matrices and Figure Copying, are correlated with a second set 
which demand different processes, namely successive synthesis, the correlations across the sets 
approximate zero. 
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These research findings are consistent with Luria’s (1966.4, b, 1973 a) original formulations of 
simultaneous and successive syntheses. Luria treats these two types of processing as of 
equivalent status, with neither subservient to the other. Das et al. (1975) have found that the 
factor analytic representations of these are best articulated through varimax rotations, and 
therefore they are orthogonal. Finally, Jensen (1973 b) reports factor analysis results which 
further support the independence of these forms of synthesis. Jensen found that varimax and 
oblique rotations yielded the same results and therefore Level I was orthogonal to Level H. A 
direct interpretation of his results is that the factors represent successive and simultaneous 
synthesis respectively. 


Abilities vs. strategies 


A further topic which requires examination in the theory of Level I-II abilities, is a necessary 
distinction between cognitive abilities and strategies. A major premise of the levels theory is the 
definition of abilities by the use of the continuum of information transformation, as discussed 
previously. Operational measures of this continuum are conjectured by Jensen through the 
location of various tests diagrammatically in the two-dimensional space defined by Level I-I anda 
culture-free—culture-loaded axis (see Jensen, 1970a, p. 154). 

An important addendum to this framework for Level I-I is Jensen’s (1970 a) discussion of 
tests that cannot easily be located in the two-dimensional test space. Jensen notes, for example, 
that paired associate learning may be performed by subjects from varying backgrounds through 
the use of differing degrees of Level I and Level II. More generally, ‘Persons tend to use the 
abilities they’ve got, and so we find some subjects approaching what for most subjects is a Level 
I task as if it were a Level II task’ (Jensen, 1970.4, p. 157). Ostensibly, the converse is true also: 
Level II tasks may be approached by some subjects using Level I abilities. 

The implications of the difficulty in classifying some tests absolutely may be best analysed 
through the proposal here of a continuum of specialization of cognitive strategies for the 
classification of tests. This continuum has as its poles, homogeneity and heterogeneity of 
strategy type.* 

Homogeneity of strategy type refers to tests which are uniformly responded to by different 
groups of subjects. With increasing degrees of homogeneity of strategy, the same abilities are 
used by different groups of subjects, that is, a common cognitive strategy is characteristic of 
multiple groups from varying populations in performing the task. Tests of this type would 
involve task demands such that the major dimension of individual differences would be formed 
by facility in one and only one cognitive process. The manifestation of homogeneity of strategy 
in factor analytic terms would be the stability of factor loadings for a test in multiple analyses 
across populations and would correspond to the common use of the term ‘reference test’ to 
identify a factor. Homogeneous tests are the type that Jensen has classified in his 
two-dimensional test space. 

In contrast, heterogeneity of strategy would refer to those tests for which different abilities are 
used by different groups in task performance. By virtue of their structure, these tests would 
allow more than one possible major dimension of individual differences, because alternate means 
for task completion are possible. These would include the type of tests discussed by Jensen as 
difficult to classify. Examples of tests of this type would include those which have a tendency to 
shift markedly in cross-cultural factor analytic work (see Vernon, 1969; MacArthur, 1973) and 
therefore prove somewhat disconcerting in theoretical rationales which attempt to tie tests 
directly to various cognitive abilities. 

The continuum of homogeneity—heterogeneity allows a conceptualization of the distinction 

* The concepts of homogeneity and heterogeneity are developed briefly here. A separate paper devoted 


solely to these concepts and their tmplications in terms of reliability and validity 1s in preparation by the 
author 
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between abilities and strategies. Tests which are highly homogeneous involve the use of the 
same abilities in multiple populations, that is, the employment of common strategies. In factor 
analytic terms, the identification of a factor by loadings from tests which are all homogeneous, 
can be either as an ability, or simply, a strategy. Alternate means for task completion are not 
available in these tests and therefore abilities and strategies are synonymous. Highly 
heterogeneous tests will load on different factors in different populations. Reference to a factor 
as a strategy becomes particularly important in this case therefore, because the factor represents 
a dimension of individual differences in a type of cognitive process, and the loading of a 
heterogeneous test on that factor represents the choice or propensity by subjects in that group to 
employ that process in the task and subsequent individual differences in the effectiveness of its 
use. 

This concept of homogeneous and heterogeneous tests may be represented schematically 
through ‘band widths’ in a test space. A narrow band width is characteristic of a homogeneous 
test and the location of the band indicates the cognitive process which is the major source of 
individual differences in the task. With increasing heterogeneity, bands would broaden to 
designate the range of processes from which a dimension of individual differences may be 
formed for a particular group. 
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Figure 1. Revised two-dimensional test space for Level I-H abilities. 


In Fig. 1, Jensen’s two-dimensional axis system is drawn in a modified form with simultaneous 
and successive syntheses replacing Level H and Level I abilities respectively. The band widths 
are shown for some tests, including Raven’s Progressive Matrices, Figure Copying, Serial Recall 
and Digit Span. It should also be noted that band width is not a function of the extent to which a 
test demands either simultaneous or successive synthesis. Tests may involve degrees of both, 
and therefore have the mid-points of their bands placed toward the centre of the 
simultaneous—successive axis. However, if they consistently evoke the use of the same strategies 
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across multiple populations, they will also be designated by a narrow width, even though they 
are located toward the centre of the axis. 

Turning again to the theory of Level I-II, it is evident that some of the concepts proposed 
above are alluded to, but not fully developed. For example, the rationale for research such as 
Jensen & Figueroa (1975) in which it was demonstrated that forward and backward digit span 
interacted with age and race implies the continuum of homogeneity-heterogeneity, but it is not 
explicit. Independent studies which have examined Jensen’s theory also have not broached this 
issue. Vernon & Mitchell (1974), for example, adopted a straightforward abilities approach to 
Jensen’s model and did not invoke the concept of strategies. 

What are the implications of the proposal of the dimension of homogeneity—heterogeneity, and 
how well is this proposal accommodated within the current model of simultaneous and 
successive syntheses? The answer may be divided into both methodological and theoretical 
considerations. 

Methodologically, a major thrust of future research related to Jensen’s model of human 
abilities should be the use of the tests which have been identified by Das et al. (1975) 10 be 
highly stable (homogeneous) across multiple population factor analyses, in order to address two 
questions. First, as an extension of the present research, to examine various group differences in 
abilities and the sources of these differences. Second, and more innovatively, to capitalize upon 
the stability of these tests in their intercorrelations with one another, in establishing a reference 
system within which to infer the cognitive strategies used by various groups in heterogeneous 
tests. This latter technique was used to considerable advantage by Jarman & Das (1977) for 
example, in a study of simultaneous and successive syntheses among different intelligence 
groups. Sixty grade 4 boys were selected from each of three ranges of IQ and administered the 
battery of tests of simultaneous and successive syntheses. Additionally, a cross-modal matching 
task was administered to all subjects; this is a test that has shown considerable factor loading 
variation in previous research by Das and his colleagues. In the results, the factors for 
simultaneous and successive syntheses emerged clearly in all three groups, attesting both to their 
existence in the groups and the homogeneity of the simultaneous—successive battery. The 
cross-modal matching test, however, was characterized by very marked factor loading variations 
and hence strategy differences among the groups. This test is quite heterogeneous, which has 
precipitated the current mixed opinion regarding the cognitive processes tapped by a task of this 
variety (see Freides, 1974). Returning to Fig. 1, this test is shown schematically as having a wide 
band relative to the simultaneous-successive dimension, with the band located in the culture-free 
area because the test contains only single tones and displays of simple dots. 

Systematic research using heterogeneous tests such as the cross-modal task, in combination 
with highly homogeneous tasks to form a reference system, could be a valuable alternate or 
complementary technique to research such as Frederiksen’s (1969) excellent analysis of cognitive 
strategies in verbal learning, which used a post-test report technique. In cross-cultural research, 
where questions of differential strategies are magnified, this technique could complement 
existing methods (see Cole & Scribner, in press). In all areas of application subsequent stages in 
research should include confirmation of strategy types through more rigorous experimental designs 
and the discovery of the environmental and historical determinants of various strategy patterns 
in different groups. 

The theoretical implications of the proposed homogeneity—-heterogeneity dimension are given 
by the structure of the central processing unit in the model of information integration proposed 
by Das et al. (1975). In this unit, both simultaneous and successive forms of synthesis are 
conjectured for each of the mnestic, perceptual and conceptual levels. The central processing 
unit also includes a third component, designated as planning and decision making, for which 
there is no counterpart in the Level I-II model. The role of this component is reciprocal vis-à-vis 
the first two. That is, the planning and decision-making aspect of central processing uses 
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information which has been integrated simultaneously and successively, and conversely, this 
component is responsible for the selection of a simultaneous or successive form of integration. 
In terms of the current literature in cognitive psychology, the role of this unit would be highly 
comparable to an executive function (Anderson, 1975) and therefore can be related to some’ 
computer models of human cognition (e.g. Newell & Simon, 1972) and also to the Metaplans of 
behaviour of Miller, Galanter & Pribram (1960). 

Evidence for the existence of planning and decision making in behaviour is available directly 
from Luria’s (1966.4, b) research, in which cortical localization for all three components has 
been established. The occipital-parietal area is primarily responsible for simultaneous synthesis, 
the anterior regions are responsible for successive synthesis, and the frontal lobe is responsible 
for planning and decision making. 

The planning and decision-making unit then, directs the choice of a simultaneous or successive 
strategy for a given task. Tests which are highly homogeneous are perceived uniformly by 
various populations in terms of their task demands, and in turn, tap primarily one type of 
synthesis. In contrast, the value of heterogeneous tests is that they not only tap effectiveness of 
a chosen strategy, they also tap the choice itself. 

There is a definite need to gain information systematically on strategy choices which are 
characteristic of different populations on theoretically important tasks. The extent to which these 
strategy choices are due to ‘set’ and other psychosocial variables, and further, the extent to 
which strategies may be modified by instructions or training, are of direct importance, especially 
in terms of educational implications. The empirical dimension of homogeneity-heterogeneity of 
strategy, in combination with the theoretical central processing unit of the Das et al. (1975) 
model, offer a framework for this type of research. Specifically, data on Level I and Level II 
abilities can also be interpreted in terms of broader implications than has been the case. 


Discrimination power and developmental differentiation 


A further characteristic of the theory of Level I-II abilities which has led directly to the 
controversial aspects of the model is the differential discriminating power of Level II tests as 
opposed to Level I tests. These differences are clearly demonstrated in Jensen (1973 b) where 
factor score profiles are constructed for three ethnic groups, and the trends indicate that strong 
group differences exist on tests of Level II ability, but differences are marginal in the case of 
Level I tests. Stated in terms of analysis of variance, Level I and Level II abilities interact with 
population groups and socioeconomic status groups. Jensen has found that these interactions 
may be ordinal or disordinal, dependent upon the groups under consideration (See Fig. 1, 
Jensen, 1973 b), but in all cases the source of interaction is markedly disparate mean group 
differences on Level II tests, in comparison to Level I tests. 

In discussions of these trends, Jensen notes quite accurately that they cannot be explained 
easily in terms of different difficulty levels between tests of the two abilities. Memory tests can 
be constructed to be as difficult as desired (low difficulty indices) and, conversely, tests of Level 
II can be developed with high difficulty indices. Jensen's point is that when both Level I and 
Level II tests meet the usual psychometric requirements in terms of the optimal range of 
difficulty indices, the tests will still have markedly different discrimination powers in 
comparisons of different SES and ethnic groups. 

The different discriminating powers of tests of the two abilities are explained by Jensen 
through use of the continuum of information transformation. Level II tests are said to demand 
more information transformation than Level I tests and therefore discrimination power increases 
as a function of quantity of transformation. 

In terms of developmental trends, Jensen has hypothesized that Level I-II abilities are 
characterized by different growth curves. These are summarized as: *... Level I rises rapidly 
with age, approaches its asymptotic level relatively early, and shows little SES difference, as 
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contrasted with Level II, which does not begin to show a rapid rise until four or five years of 
age, beyond which the SES groups increasingly diverge and approach quite different asymptotes’ 
(Jensen, 1970, p. 165). 

It has been argued previously that simultaneous and successive syntheses are replacements for 
Level II and Level I abilities respectively. Further, it has been stated that these two forms of 
syntheses do not form a hierarchy, but instead are orthogonal in factor analytic terms, that is, 
functionally independent. This latter claim was consistent with Luria’s (19664, b) original clinical 
observations, the factor analyses which extended this theory (Das, 1973 b; Das et al. 1975), and 
finally, in the reinterpretation of Level I-II abilities, it is consistent with Jensen’s data (Jensen, 
1973 b). 

Despite the finding that simultaneous and successive syntheses are functionally independent, 
the question of equal discrimination power can still be raised. Luria (1973 a) does not discuss this 
aspect of the forms of syntheses but data from other sources, in addition to the substantial body 
of data gathered by Jensen, appear to indicate that successive synthesis is potentially a less 
discriminating cognitive activity than is simultaneous synthesis. Miller's (1956) classic paper on 
the limits of serial memory argues cogently for reduced individual differences in successive 
synthesis, as does much subsequent literature. With respect to simultaneous synthesis, various 
related constructs are in evidence which convey similar ideas regarding task demands and 
discrimination power. Pascual-Leone & Smith’s (1969) notion of central computing space for 
example, may be interpreted to mean that the multidimensional aspects of simultaneous 
synthesis could well hold more potential in terms of increased cognitive demands. 

Indications that simultaneous synthesis may be more powerful than successive synthesis as a 
discriminator of cognitive maturity, in turn, are consistent with a recent and very significant 
body of developmental data on language, specifically, syntagmatic and paradigmatic clustering. A 
syntagmatic association is one between words which have some kind of sequential relationship 
to each other and are generally members of different grammatical classes forming a syntactically 
organized phrase, sentence or sentence fragment. Thus throw and ball are syntagmatically 
related. A paradigmatic association, on the other hand, is one between words of the same 
grammatical word class, and which are semantically substitutable (can replace each other in 
sentences). Thus throw and toss, and cold and warm are related paradigmatically. 

It has been shown that the syntagmatic—paradigmatic distinction is related ontologically to 
development which takes place during childhood. In free association (Brown & Berko, 1960; 
Ervin, 1961; McNeill, 1963; Routh & Tweney, 1972), and in free recall clustering (Denney & 
Ziobrowski, 1972; Denney, 1974), a tendency to categorize syntagmatically changes to a 
tendency to categorize paradigmatically. The shift takes place between the ages of 6 and 9 
(Denney, 1974a). The paradigmatic categorization tendency continues through college and 
middle age (Denney & Ziobrowski, 1972), to old age, when a shift back towards a syntagmatic 
tendency takes place (Denney, 1974b, c; Denney & Acito, 1974). 

The terms syntagmatic and paradigmatic have also been used as a classification scheme for 
various kinds of aphasic disorders (Jakobson, 1964; Gerschwind, 1970). Pribram (1971, pp. 
357-360) and Luria (1973 b) have expanded upon Jakobson’s scheme, and related it to Luria’s 
(1966, b, 1973 a) model of simultaneous and successive processing: both have stated that 
paradigmatic responses are a function of simultaneous processing and syntagmatic responses are 
a function of successive processing. Lesions which result in aphasic disorders of either type are 
situated within areas wherein lesions disrupt the corresponding type of processing. 

Collectively, these data indicate that the two forms of processing are not of equivalent status 
developmentally and in terms of discrimination power. It has been established that both varieties 
can be identified in school children through cognitive measures as early as grade one (Das & 
Molloy, 1975), but when the language research is considered also, indications are that 
simultaneous synthesis is a more advanced form of cognitive activity. 
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In the light of the present research then, there appears to be some congruity between 
assumptions and evidence concerning Level I-II discrimination power and developmental 
differentiation, and the independent assumptions and evidence for simultaneous and successive 
syntheses. In this regard, the two theories appear indistinguishable. Perhaps research in the 
future will clarify this issue further through the use of new cognitive tests and further integration 
of the psychology of language. 


Summary and Conclusions 


This discussion has dealt with the basic assumptions and related empirical evidence for the 
theory of Level I-II abilities. The points that have been raised regarding the theory have been 
made primarily from the framework afforded by the model of simultaneous and successive 
syntheses. 

It has been argued that in terms of the definition of the two types of abilities by a continuum 
of quantity of information transformation, the definition of Level I is inconsistent with the 
current memory literature, the use of a memory-reasoning dichotomy creates some internal 
inconsistencies in the theory, the theory does not take account of control processes in cognition, 
and memory as an ability may be artificially separated from other cognitive processes. In 
complement, Level II ability cannot be explained as conceptual learning or reasoning. The model 
of simultaneous—successive syntheses appears more adequate in each of these areas, while still 
supported by Level I-I data. 

The empirical relationship between Level I and Level II abilities also appeared consistent with 
the simultaneous-successive model. Current data indicate more evidence for functional 
independence than for the hypothesized dependency relationship of Level II on Level I ability. 

A distinction was made between strategies and abilities, and this was related to the central 


processing unit of the simultaneous-successive model. Methodological suggestions were made 
which incorporated this distinction and the theoretical implications were noted. 

Finally, discrimination power and developmental differentiation were considered, and it was 
found that the two theories appear indistinguishable in these respects at the present time. 

The major conclusion of the foregoing discussion is that the theory of Level I and Level II 
abilities requires modification to accommodate related literature in cognitive and physiological 
psychology. When examined in terms of its essential internal assumptions, the theory of Level 
I-II abilities can be reinterpreted and placed in a broader perspective. Simultaneous and 
successive syntheses appears to be the best model for this purpose. 
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Chlorpromazine and serial reaction performance 


L. Hartley, T. Henry and J. Couper-Smartt 





The effect of 25 mg or 75 mg of chlorpromazine on the serial reaction performance of 12 male human 
subjects was studied. 

Speed or number of correct responses was reduced by the high dose of chlorpromazine and errors of 
commission and omission were increased. The adverse effect of the drug upon speed appeared only at the 
end of the half-hour test. By contrast the adverse effect upon gaps and errors was apparent throughout the 
test and there was no interaction with test duration. 





A good deal of information about the effect of environmental stresses, such as sleep-loss, loud 
noise, length of work period, air pollution, heat and cold has become available. Much of this 
literature is reviewed by Poulton (1970). Broadbent (1971) attempted a theoretical integration of 
the results in terms of a two-factor arousal model. The introduction of a second factor into the 
explanation of the effects of environmental stresses based upon the familiar curvilinear relation 
between performance and arousal, was necessitated by a number of interactions between the 
stresses. Principally, the facts that both loud noise and sleep loss had increasingly adverse 
effects over the course of a test, and yet cancelled if applied together, were difficult to reconcile 
with the single factor model of arousal. The effect of time on task in increasing the adverse 
effects of the stresses, suggested that an additional factor was present which normally 
compensated for the effects of stress. Although the adverse effects of noise might increase 
during the period of exposure, it was implausible to imagine that the slight increase in time 
without sleep, accumulated during the course of a test, could account for the decline in 
performance in that case. Consequently it was proposed that the compensatory mechanism 
became impaired during prolonged work, and thus the effect of sleep loss and noise grew as the 
organism’s ability to combat the stress fell during the work period. 

This two-factor model of the action of stresses has largely been devised to handle 
environmental stresses that have, in the main, appeared to increase arousal, with the exception 
of sleep loss. There are, therefore, few data on agents that are likely to decrease arousal, such 
as the phenothiazine drug, chlorpromazine. Such data might be especially interesting since they 
could lead to practical measures to combat overarousal associated with many environmental 
stresses. Indeed in the clinical setting phenothiazines have been used with some therapeutic 
success (Davis, Gosenfeld & Tsai, 1976) in patients displaying some aspects of overarousal 
(Lapidus & Schmolling, 1975). 

The introduction of the second arousal factor leads to the question of whether the 
arousal-reducing qualities of chlorpromazine occur at the level of the compensation mechanism 
or at the level of the primary arousal mechanism. The model, however, predicts different 
consequences of the different levels of intervention. Impairment of the compensation mechanism 
might be expected to cause an indiscriminate effect on all aspects of a task but this would 
also depend on the state of the primary arousal system. An increase in the trend to worse 
performance over the work period in all aspects of the task would be particularly expected, if 
primary arousal was also impaired. Prolonged work was a factor supposed particularly to impair 
the compensation mechanism. 

An impairment of the primary arousal system, whilst leaving the compensation mechanism 
intact should lead to an immediate impact in those aspects of performance over which the 
compensation mechanism has no control. There should be correspondingly little change in 
performance under compensatory control. 
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A task which might reveal whether the experimental variable affects the primary or 
compensatory arousal mechanism, is the five-choice serial reaction test. In its usual form the 
subject is required to respond as quickly and accurately as possible, but at his own pace, to five 
alternative stimuli. The speed score, or number of correct responses made in the test, is 
supposed to be under control of both primary and compensatory arousal mechanisms. The 
subject controls his own rate of work, and can compensate for pauses or periods of slow 
responding by speeding up his work at other times. If chlorpromazine therefore affected only the 
primary arousal mechanism it would not be expected to influence rate of work immediately, 
because of the action of the compensatory arousal mechanism in maintaining optimum 
conditions. 

An additional action of chlorpromazine upon the compensatory mechanism would, in contrast, 
allow the full effect of impairment of the primary mechanism to appear in the speed score. 

The other two measures of performance at the test, commissive errors and omissive errors of 
gaps or pauses in performance, are less clearly subject to compensatory control. Both 
commissive and omissive errors, once made, cannot be rectified by changes in performance, as 
periods of slow responding can be. Both kinds of error are measures of discrete failures in 
performance that compensation cannot obscure. In contrast, the speed score incorporates the 
compensations for failures in performance in the average record. 

The two error scores might therefore be expected to reflect action of the drug upon the 
primary arousal mechanism irrespective of its influence upon the compensatory arousal 
mechanism. Comparing the action of chlorpromazine on speed with commissive and omissive 
errors should distinguish between an effect of the drug on both compensatory and primary 
arousal and on primary arousal alone. 

An important supportive finding is that of Mirsky & Rosvold (1960). They found that 
chlorpromazine had greater impact on an experimenter paced task than on a subject paced task. 
The important feature for the present experiment is that the experimenter-paced task measured 
discrete failures in performance undisguised by compensation. The subject-paced task 
incorporated efforts at compensation for inefficiency in the average score. Mirsky & Rosvold’s 
experiment therefore points to an action of chlorpromazine on the primary arousal mechanism. 

In the present experiment the effect of chlorpromazine on the five-choice serial reaction test 
was studied. It was anticipated that there would be differences in its effects upon speed and 
commissive and omissive errors, relating its action to the primary arousal mechanism. 


Method 
Design 


Each subject received three drug treatments orally administered: placebo, a low dose and a high dose of 
chlorpromazine. The three drug treatments were presented ın an order determined by a Latin Square 
counterbalanced for first-order transfer effects Testing on the five-choice under each drug condition took 
place not less than four days apart, and usually treatments were separated by one week. Drug treatments 
were administered blind. 


Subjects 


Twelve male students and employed men served as subjects. Ages ranged from 15 to 26 years. All subjects 
were reimbursed for their inconvenience and transportation costs. All subjects were given blood and liver 
function tests, and only asked to participate if these were judged within the normal limits by a qualified 
medical officer. 


Apparatus and drugs 


Subjects were tested on the five-choice serial reaction task (Leonard, 1959). The subject sat before a vertical 
display and a horizontal response board. The display contained five light-emitting diodes arranged in a 
pentagon. The response board contained five 2-5 cm brass discs arranged in a pentagon identical to the 
display. The subject was required to tap with a spring-loaded brass stylus the disc corresponding to the 
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source that was illuminated. When the subject tapped any of the five discs the display changed immediately 
to another source. There were three scores of performance on the test. First, speed of performance was 
assessed by measuring the number of correct discs tapped (the ‘corrects’ score). Second, commissive errors, 
or the number of incorrect discs tapped, was recorded (the ‘error’ score). Finally, omissive errors, or the 
number of pauses of greater than 1-5 sec between tapping one disc and tapping the next, was recorded (the 
‘gaps’ score). 


Procedure 


Each subject spent the night preceding the test sleeping in the laboratory under close supervision, as part of 
a sleep research project. This procedure ensured that 10 hours elapsed before subjects received the drug 
treatment, so that any recreational drugs subjects had taken before the experument might be largely 
metabolized. In any case, subjects were requested to abstain from recreational drugs during the day prior to 
the experiment. 

Following the normal night’s sleep in the laboratory, subjects were wakened at 7.30 a.m. given breakfast 
followed by the drug treatment at 8.15 a.m. All subjects were tested on the five-choice task for 30 min at 
approximately 2.00 p.m. on the same day, following lunch. 

On their first visit to the laboratory, subjects arrived early ın the evening of the day preceding the test. 
They were then given 10 min practice at the five-choice. Subjects were told therr scores following this 
practice. During testing following the drug treatments no information regarding level of performance or other 
incentive was given to the subjects. 

Prior to each of the three experimental tests, subjects were given instructions to work both quickly and 
accurately at the task, making as many correct responses and as few errors and gaps as possible. They were 
then told to start, and 30 min later, were told to stop responding. 

Apart from the placebo, two active drug treatments were administered: a low dose of chlorpromazine of 
25 mg per 68 kg of body weight, and a high dose of 75 mg per 68 kg of body weight. Sixty-eight kilogrammes 
is the average body weight of the young male New Zealander. Each dosage was adjusted proportionately for 
differences in body weight. The active drugs were administered orally, ın a liquid vehicle consisting of 
peppermint and spearmint oils, citric acid, fruit cup 868, caramel, sucrose, water and chlorpromazine. These 
solutions were made up to contain 25 or 75 mg of active drug per 10 ml. Dosage was varied according to 
body weight by adjusting the volume of preparation given. 

The placebo mixture was prepared according to the same formula except it contained quinine sulphate 
instead of chlorpromazine. The volume of the placebo administered was adjusted to body weight in the same 
way as the active drug was. 

The two drug dosages were selected according to a number of criteria. First, dosages were selected to be 
comparable with previous research in the area (e.g. Rappaport & Hopkins, 1971); second, these dose levels 
were believed to constitute a minimal health hazard to the subjects; third, since the initial experience of the 
drug is heavily sedative, higher doses might have caused too much incapacitation of subjects to secure useful 
data. 


Results 


Results were obtained for the speed score (corrects), commissive errors (errors) and for omissive 
errors (gaps) for each 5 min block of each of the three 30 min tests. Mean gaps, errors and 
corrects for each combination of block and drug treatment are presented in Tables 1, 2 and 3. 

Two factor analyses of variance (drug treatment x blocks) were computed for the three scores 
using interactions of factors with subjects as error terms. 


Gaps 
As Table | shows, chlorpromazine had a very pronounced adverse effect in the higher dose, 
causing a larger number of gaps to be made (X = 12-87 gaps) in comparison to the low dose 
condition (X =4-36 gaps) and placebo (X = 4-26 gaps) (F = 3-85, d.f. =2, 22, P= 0-03). Both 
these comparisons with the high dose were statistically reliable by Wilcoxon’s test, T= 11, 
P<0-05 in the case of the placebo comparison and T= 12, P< 0-05 in the case of the low dose 
comparison. 

Although there was the expected trend of increasing gaps during the course of the test, 
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(P = 0-003), there was no interaction between drug treatment and time on task. The conclusion 
that there was no interaction was true of both analyses of raw data (P > 0-05) and of Jog 
transformations of raw data (P> 0-7). 

The low dose of the drug seems to have remarkably little effect on gaps, but the high dose of 
the drug seems to have an adverse effect on gaps independent of time on task. 


Table 1. Mean gaps in each 5 min block for each drug treatment 


Blocks 

l 2 3 4 3 6 
Placebo 2-92 3-83 3-67 3-33 3-25 8-58 
Low dose 2-42 3-75 4-17 5-08 4-92 5-83 
High dose 5-88 10-78 10-23 12-93 15-94 21-43 


Errors 


The effect of the drug treatment on errors was closely similar to that in gaps, as Table 2 shows. 
Chlorpromazine had a very pronounced adverse effect in the higher dose, causing a large number 
of errors to be made (X = 18-92 errors) in comparison to the low dose condition (X= 9-21 errors) 
and the placebo condition (X = 9-22 errors) (F= 4-05, d.f. = 2, 22, P=0-03). Both these 
comparisons with the high dose were statistically reliable by Wilcoxon’s test, T=8, P< 0-02 in 
the case of the placebo comparison and T= 13, P< 0-05 in the case of the low dose comparison. 
Although there was the expected trend of increasing errors during the course of the test 
(P = 0-003), there was again no interaction between drug treatment and time on task. The 
conclusion that there was no interaction was true of both analyses of raw data (P > 0-80) and of 
log transformation of the raw data (P> 0-80). Again, the low dose of the drug seems to have 
remarkably little effect on errors, but the high dose of the drug seems to have an adverse effect 
on errors independent of time on task. 


Table 2. Mean errors in each 5 min block for each drug treatment 





Blocks 

1 2 3 4 5 6 
Placebo 6 83 9-17 8-50 7:83 10-33 12-67 
Low dose 6 67 8-25 8-50 10-17 11-00 10-67 
High dose 11-56 16-07 16-85 18-78 23-66 26 60 


Corrects 


As Table 3 shows, there are two conspicuous features in the results. First, there was an adverse 
main effect of the high dose of the drug on corrects, F=6-71, d.f. =2, 22, P=0-005. In all, 
fewer corrects were made under the high dose condition (X = 333-14 corrects) than under either 
the low dose condition (X =379-57 corrects), P< 0-05, or the placebo condition (X = 379-57 
corrects), P< 0-01. 

Second, there was a pronounced interaction between time on task and the high dose of the 
drug, F = 2-08. d.f. = 10, 110, P= 0-03. Number of corrects per block remained constant under the 
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placebo condition. Whereas in the first block, there was a conspicuous but insignificant tendency 
for there to be more corrects in the high dose than in placebo conditions, there was a steep fall 
in the high dose condition over consequent blocks. In the last block fewer corrects were made 
under the high dose than under the low dose or placebo, P< 0-05 in each case. 

A similar pattern of declining performance over the test, occurred to a lesser degree under the 
low dose condition. 

In the case of this score, the effect of the high dose of drug was crucially dependent upon 
time on task; the adverse effect grew greater the longer the task was continued. 


Table 3. Mean corrects in each 5 min block for each drug treatment 





Blocks 

I 2 3 4 5 6 
Placebo 381-6 376-3 376-4 382-0 378-5 382-7 
Low dose 376-3 365-5 361-4 358-8 349-3 352-1 
High dose 415-8 329-5 321-4 320-7 311-3 300-1 





Discussion 


The pattern of results generally confirms expectation. The omissive (gaps) and commissive 
(errors) error scores, which were thought to reflect the action of the drug on the primary arousal 
mechanism independently from its action on the compensatory mechanism were adversely 
affected by the drug. The important feature of these results is that there is no interaction with 
the task duration. The adverse effect of the drug is, in these two scores, independent of the 
amount of time subjects have spent performing the test. At the start of the test and at the end of 
the test, the effect of the drug is the same. 

This would certainly be the expected effect of an action of the drug on primary arousal. 

By contrast, the effect of the drug on number of correct responses or speed, does strongly 
interact with task duration. If anything, the effect of the drug is beneficial at the start of the test, 
but wholly detrimental at the end. This result is again consistent with an effect of the drug on 
the primary arousal mechanism alone. For it will be recalled that prolonged work was a factor 
that was supposed to impair the compensation mechanism progressively. Since measures of 
average performance, such as speed should reflect the beneficial action of the compensation 
mechanism, the adverse effect of the drug should only become apparent towards the end of the 
test, in this score. Only after prolonged work has impaired the compensation mechanism will the 
adverse effect of the drug on primary arousal, be observed in the speed score. 

As mentioned previously, a similar conclusion about the effect of chlorpromazine upon 
subject- and experimenter-paced tasks was drawn by Mirsky & Rosvold (1960) when comparing 
the percentage of impairment caused by 200 mg of chlorpromazine with 200 mg of secobarbital. 
They found that chlorpromazine had a relatively larger adverse effect upon a continuous 
performance task than on a digit symbol substitution task. The barbiturate, conversely, produced 
a larger effect on the digit symbol substitution task. In the continuous performance task, a 
stimulus requiring response was presented at a rate paced by the experimenter. The digit symbol 
substitution task was unpaced and allowed the subject to go at his own rate of work. 
Consequently the lapses in attention would have had no effect on the latter task, but should have 
effects on the continuous performance test. Kornetsky & Orzack (1964) provided partial 
confirmation of Mirsky & Rosvold’s (1960) suggestion as to the importance of task pacing in the 
adverse effect of chlorpromazine in two tests that were more closely similar than the original 
tests, apart from the pacing aspect. 
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Mirsky & Rosvold (1960) also suggest that their two tests are reflecting the function of two 
different neurophysiological systems. On the one hand, barbiturates and a number of other 
depressants, which primarily affect the unpaced task, possess electrographic and 
neurophysiological similarities suggesting they have a major depressant effect upon the cerebral 
cortex. Chlorpromazine and sleep deprivation, which seem to have most impact on the paced 
task, are supposed to affect primarily structures of the medial reticular formation. 

This localization of at least one significant effect of chlorpromazine to the reticular formation 
is largly consistent with the recent suggestion that the antagonism of dopaminergic receptors by 
chlorpromazine is important behaviourally (Snyder, Banerjee. Yamamura & Greenberg, 1974; 
Iversen 1975). Although there are several areas in the central nervous system containing 
dopaminergic neurones, certainly the largest fibre tract rises at the level of the mesencephalon, 
(Fiixe, Hikfelt & Ungerstedt, 1970). 

If the idea of two mechanisms of arousal is adopted (Broadbent, 1971), the present results can 
be explained by supposing that chlorpromazine only impairs the primary mechanism of arousal 
which is concerned with the execution of well-established decisions. The drug may have this 
feature in common with noise and with sleep loss as Rosvold & Mirsky (1960) suggest, and the 


idea is consistent with what pharmacological information is available on its site of action. 
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Book reviews 


Personality Description in Ordinary Language. By D. B. Bromley. London: Wiley. 1977. Pp. 278+x. 
£9.00. 


Perhaps this book can best be seen as the second volume of a research-in-progress series rather than as 
something in the way of ‘the last word’ in a ‘new approach to the scientific study of personality’, to quote 
the publisher’s blurb. The first volume of the series was published in 1973. In the course of the research a 
lexical and semantic content analysis of everyday language descriptions for aspects of personality has been 
developed. This method, the senior author claims, used in conjunction with his quasi-judicial method of 
deriving case-studies, provides a comprehensive conceptual framework for the study of personality. An 
important additional outcome of the research, it might be added, is the way it demonstrates the need for 
careful pre-planning at the information-gathering stage in casework, so that all relevant aspects of the 
particular client are taken into account. 

In the first volume of the series published in 1973, a study was reported in which nearly 3000 short, 
written descriptions of ‘self’ and ‘others’ were obtained from 320 schoolchildren at eight age-levels ranging 
from 7 to 15 years. These descriptions were first divided into ‘statements’ and then sorted into ‘content 
categories’, and the relative findings at different age-levels discussed. In the study reported ın the present 
volume (chapters 5, 6, 7), 1920 ‘other person’ descriptions were obtained in writing from 240 adults at six 
different age-levels (20-70 years). A lexical and semantic analysis has been done on these and, drawing on 
the same material, the quasi-judicial method of deriving case-studies expounded. In a forthcoming volume 
the research will be taken a stage further. ‘Self’ descriptions by the same group of adults as in the present 
study will be analysed and the findings discussed. 

The most impressive feature of the present volume ıs the thoroughness with which the two complementary 
methods of the proposed approach to the study of personality — the method of content analysis and the 
method of deriving case-studies — are outlined and discussed. First, the lexical and semantic method of 
content analysis: this was derived by examining in detail the words and phrases of the descriptions obtained 
for the research. Thirty ‘content categories’ were drawn based on this analysis ~ each ‘content category’ 
describing a different aspect of personality, e.g. present circumstances, motivation and arousal. The 30 
‘content categories’ are grouped under three headings in the present book: ‘internal’, ‘external’ and ‘social’ 
and other aspects of personality. The lexical and semantic method of content analysis shows how 
constrictive ‘trait psychology’ in the study of personality tends to be, and, at the same time, demonstrates 
the importance of linguistics for study m this area. The second (and complementary) method in the suggested 
approach to the study of personality — the quasi-judicial method of deriving case-studies — 1s the development 
of an idea which has been current for many years in casework, but until now, has not had its potential fully 
explored. In 1917, for instance, M. E. Richmond commented on it in his book Social Diagnosis. The 
quasi-judicial method of case-study enables it to be seen how much our everyday understanding of people 
has in common with understanding at professional level, as this figures in personality description and theory. 
From an applied point of view, the quasi-judicial case-study takes the form of an explanation of the 
individual’s maladjustment or adjustment (depending on the particular field of casework), the basic aim being 
to formulate a meaningful argument to account for, or to justify, his behaviour. Development of these two 
methods, it is claimed, has now reached a stage where, used in conjunction, they constitute a viable 
approach to the study of personality ~ particularly the psychology of adjustment and adaptation. Review of 
the research to date would seem to substantiate this claim, and to suggest that the overall, stated arm of the 
research ‘to provide a general conceptual and methodological framework for the description and analysis of 
personality ’ is on the way to realization. Such an achievement is impressive, if considered only from the 
point of view of the sheer amount of foundation-laying research involved. Much preliminary research, of 
course, remains to be done. This would include full-scale investigation of relevant aspects of topics such as: 
spoken — as opposed to written — personality descriptions; affective aspects of language mainly in regard to 
spoken descriptions; the psychological development of the self-concept; non-verbal communication m spoken 
descriptions of ‘self’ and ‘others’ - the smile, the hesitation, the silence, the fractional shrug, other 
expressive behaviour; examination of language, spoken and written, in a context of communication (this 
would seem particularly important in the case of this research, considering the general area — clinical 
psychology and social work -in which the research is being done). Research reports on some of these topics 
are promised in a forthcoming volume. Full-scale exploration of most of those listed would involve 
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enormous expense at present, both in regard to research teams and equipment. Exploration in the not too 
distant future, however, is by no means out of the question — the widespread use of video-recording at 
present compared with, say, 20 years ago is an example of how rapidly technological innovation can become 
generally available. 

Finally, it is refreshing to see a clean break from the trait-cum-statistical approach to the study of 
personality, though whether this has been the strait-jacket on headway in research on personality which the 
author suggests, only many more years of research will reveal. It is pleasing too, to see what is clearly a 
promising approach to the study of personality under way which makes Allport’s musings 40 years ago about 
a ‘law of uniqueness’ figuring in this area of research some day, seem more than just a pipe-dream, 

JAMES G. M'COMISKY 


Handbook of Modern Personality Theory. By R. B. Cattell & R. M Dreger. London: Wiley. 1977. Pp. v+804. 
£28.00. 


At £28, and 800 pages, this is a very large and expensive book. Does it fulfil its function as a handbook, and 
is it good value for what it is? There are 39 contributors, and as one might have expected there are great 
individual differences in the competence and value of their contributions. My main criticism would be that 
the book is not a ‘Handbook of Modern Personality Theory’, but essentially a ‘Handbook of Cattell’s 
Personality Theory’; other theories are not entirely neglected, but they are very much relegated to an 
inferior position. There is also a dearth of proper critical evaluation of Cattell’s own work; this would have 
been extremely valuable. It is well known that his scales have very low reliability, that it is doubtful whether 
the primary factors he lays stress on contribute very much above the contribution of the higher order factors 
of extraversion and neuroticism, that there has been a wholesale failure of replication when other people 
have tried to extract Cattell's factors from correlations between the items used in his questionnaires, and 
that when alternate versions of his scales have been administered, scales supposedly measuring the same 
factor have not necessarily correlated more highly together than they did with scales supposedly measuring 
different factors! These and many other criticisms may have a proper answer, but they are not dealt with in 
this book. This is a serious drawback, and gives quite a wrong impression of the state of the art. 

There are a number of chapters which do not have much to do with personality theory at all. There is for 
instance an excellent chapter on ‘Psychological genetics, from the study of animal behaviour’. This is highly 
technical, but it is doubtful whether the typical reader of a book such as this would be able to follow the 
argument, or see the relevance to personality. I would agree that such work is relevant, but the links are 
missing which would make this relevance clear to the uninitiated reader. Isolated chapters of high excellence 
do not necessarily make a good book where integration is missing, and while many chapters do clearly 
show a very high quality, and summarize important areas of interest in an excellent manner (e.g. 
‘Physiological concepts and personality research’; ‘The genetics and development of sex differences’); these 
tend to be in marginal areas rather than ın central ones. Perhaps my criticism of this book can be expressed 
in one single figure; the bibhography contains ten times as many references to Cattell’s work as to that of 
anyone else! This 1s not a reasonable allocation of space, and a book organized, insofar as it has any 
organization, almost entirely in terms of Cattell’s system is not viable as a ‘Handbook of Modern 
Personality Theory’. On the whole therefore I doubt if many psychologists will wish to buy this book; they 
would be better served by reading isolated chapters in their library copy. 

H. J, BYSENCK 


Theory and Practice in Interpersonal Attraction. Edited by Steve Duck. London: Academic Press 1977. Pp. 
438. £9.80. 


Steve Duck is the most active research worker in this country on interpersonal attraction, and has produced 
two other books and a number of papers on it. He is best known for his version of the filter-theory, and the 
finding that friends have similar constructs. Interpersonal attraction is one of the current band-waggons in 
US social psychology, and there are several good reviews of the research on the subject (e.g. Huston, 1974). 
The present book has an interesting and unusual plan: Section I consists of six expositions of theories by 
different and mainly American authors, together with reviews of supporting evidence; Section I consists 
mainly of reprints of typical empirical papers by the same authors; Section III contains four chapters by 
British psychologists relating findings on interpersonal attraction to broader psychological approaches. 

The six theories presented in Section I represent the main range of North American viewpoints on 
interpersonal attraction, though they do not include several important directions of research, e.g. Walster & 
Berscheid on love, or the role of non-verbal signals. These authors are primarily concerned with the 
antecedents of attraction, not with the social processes involved. Clore maintains that attraction is mainly 
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produced by positive affective experiences; Ajzen that it is due to the integration of information about the 
other in a Fishbein manner; Stroebe concludes that we are attracted to those who evaluate us favourably, 
especially if we have low self-esteem; Murstein proposes a theory of stages, ending in marriage, based on 
exchange theory principles in different ways at each stage, though the evidence ts rather scanty, La Gaipa 
reviews a number of theories of exchange and reciprocity, without deciding between them; Seyfried 
discusses various explanations of complementarity, though admits that there is little evidence for the 
phenomenon. 

In Section II a number of papers by the same authors are reprinted, mostly published already in American 
journals. They share a certain style, reflecting the strengths and weaknesses of social psychological research 
in this area. They are carefully designed experimental studies, using the right statistics. They succeeded in 
operationalizing theoretical variables, and in distinguishing between different theories. On the other hand 
they mostly used highly artificial experimental situations, in which subjects expressed their attraction to 
imaginary others, or to people whose photographs they had seen, or to people they had met very briefly. 
There is a certain amount of evidence that experiments lıke this can get results which are simply wrong 
(Argyle & McHenry, 1971). Some of the results are not very surprising, like the finding that male students 
are more attracted to attractive females. 

Section ITI contained, for me, the most interesting chapters. Cook gives a very good account of the social 
skills - NVC - rules approach to attraction, though concentrating on the skills of seduction. Harré’s paper is 
the most original in the book. He uses a Goffmanesque natural observation method, and notes the rituals of 
drinks, meals, etc., used to establish and maintain relationships in different subcultures. He draws attention 
to different kinds of relationship including negative relationships, and the ways they are maintained. Kelvin 
also has ideas rather than data, and points to the loss of power and increase of vulnerability brought about 
by friendship. Duck’s final chapter explores the links between attraction and personal construct theory in an 
interesting way: we compare our constructs with those of our friends, we discover what their constructs are, 
and elaborate our own constructs as a result. 

The field of interpersonal attraction has shown recent signs of life - the awareness of stages, and the study 
of love, for example. But the simple effects of similarity, proximity and attractiveness have been known 
since 1950, and simple theones of reinforcement, etc., have closed the minds of most investigators to more 
interesting questions. Here are a few. (1) Why can’t some people make friends? This is a widespread source 
of distress. (2) What are the main forms of attraction and attachment? How about relationships between 
people at work, those of different age or status, platonic relationships, ambivalent relationships, persecutors 
and victims, etc? (3) What kinds of relationship occur in other cultures? (4) Why do relationships come to an 
end? Presumably personal constructs are still similar, for example. (5) What do friends (or other pairs) do 
together? Is this necessary to sustain the relation? I am glad to hear that Steve Duck’s current research bears 
on some of these issues, and I look forward to reading about these in a later book. 

MICHAEL ARGYLE 


ARGYLE, M & McHenry, R (1971). Do spectacles Huston, T. L. (1974). Foundations of Interpersonal 
really affect judgments of intelligence? Br. J. soc. Attraction. New York: Academic Press. 
chn Psychol. 10, 27-29. 


Face-to-Face Interaction: Research, Methods and Theory. By Starkey Duncan, Jr. & Donald W. Fiske. 
Hillsdale, N.J.: Lawrence Erlbaum 1977. Pp. xxiii + 361 £13.90. 


Starkey Duncan is best known for his review of non-verbal communication, in which he distinguished 
between ‘external variable’ and ‘structural’ approaches, and for his own research on turn-taking. The 
present volume reports an external variable study of non-verbal communication, and a more elaborate 
analysis of the turn-taking research. Each study involved a massive amount of data collection and analysis, 
and each ıs reported at some length This is an important book for those concerned with dyadic interaction, 
but it demands a lot of work from the reader to find out what was found out. 

The ‘external variable’ study consisted of correlating 49 non-verbal variables assessed for 88 subjects who 
each interacted with two different strangers for 5 minutes, about 30000 acts were scored. Clear sex 
differences appeared: females smiled, laughed and looked more, had shorter utterances and less than half the 
number of filled pauses. Correlations between variables were low, and mainly between variables of the same 
type, e.g laughing and smiling, or turn-taking signals. Correlations were found between scores on the 49 
non-verbal variables and self-report personality scales; the authors conclude that there were no consistent 
findings. A number of the correlations were in fact quite high, but they were usually different for different 
sexes of subject and partner. This has been found in other studies, and been shown to fit into coherent PxS 
analyses. This study has not really an external variable study, since there were no hypotheses, and no 
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external variables (apart from sex). It was not so much misguided as unguided. Nevertheless a substantial 
body of data was collected which will be of interest to other research workers. 

The ‘structural’ study consisted of a very detailed internal analysis of eight long conversations, using the 
kinaesic and paralinguistic methods of The Natural History of an Interview, which involves anelysis down to 
the level of the syllable. There was an exploratory study of the first two interviews, and the findings were 
replicated on the other six. Again there was no particular hypotheses, but clearly the authors were interested 
in the signals for turn-taking, and in the signals described by earlier workers, like Kendon. This turned out to 
be a most interesting study because of the successful development of a linguistic-type model of interaction 
sequences. Here is an example of one of the findings: ‘speaker within turn signal’ is rapidly followed by 
‘auditor back channel display’ which leads to ‘speaker continuation signal display’. Each of these three 
kinds of signal is a category, containing a number of alternative signals; for example ‘auditor back channel 
display’ includes five different signals — head nods, sentence completion, etc. The results cover the effects of 
signals governing turns, i.e. complete utterances, and also smaller units within turns, governed by within-turn 
signals. A number of different signals may act as alternatives, or may act in combination. The principles of 
sequence are described as optional rules, since the probability of response is typically 30-60 per cent. In one 
case it is almost 100 per cent — the ‘speaker gesticulative signal’ which makes ‘auditor turn-taking attempts’ 
very unlikely. 

The findings are rather different from those of Kendon on gaze, and of Meltzer and Hayes on changes in 
voice amplitude; it would have been useful to know how far the new data disconfirm those earlier findings. 
Although the results are expressed in terms of ‘rules’, no subjective data were obtained to demonstrate the 
presence of rules, as opposed to lower level processes. It is perhaps not made explicit enough that 
turn-taking depends on the desires of interactors to speak and listen, on their roles (e.g. interviewer, 
therapist), and on their habitual styles of interaction. 

Nevertheless the main study is an outstanding example of the structural approach. This kind of research not 
only involves a lot of work, but it also requires ideas to direct the data analysis. It isn’t possible to carry out 
scientific research without ideas — to direct the data-collection and the analysis. The present authors do not 
reveal their ideas, but they were certainly there, and they are mainly based on the findings of previous 
investigators like Kendon. 

The last part of the book, on a meta-theory approach to interaction, draws the distinction beiween rules or 
conventions, which produce the cooperative coordination of behaviour, and interactive strategies within the 
rules. This is rather similar to ideas current in this country of social skills operating within a ccntext of rules, 
except that goals and feedback are not clearly included. The authors are unwilling to admit subjective 
methods of analysis, and condemn attempts to measure the meaning of social acts — ‘Subversive’ and 
hopelessly inaccurate; on the other hand they quote with approval Goffman’s account of 24 explanations a 
motorist might give for driving through a red light (No. 15. ‘He’s an inspector testing the vigilance of the 
cops on duty’). 

Here, as before, a distinction is drawn between structural and external variable studies, and it is concluded 
that structural ones are best. However they can be combined. In some of our current research we are 
studying variations in structure as a function of situational (1.e. external) variables; for example it is 
predicted that the grouping of functionally equivalent elements will be different in ways which can be 
predicted from the properties of situations. 

MICHAEL ARGYLE 


Social Exchange Theory: Its Structure and Inffuence in Social Psychology. By J. K. Chadwick-Jones. 
No. 8 in European Monographs in Social Psychology, edited by H. Tajfel. London: Academic Press. 
Pp. vit431. £11.80. 


This book attempts to give an account of all the work in social psychology which has been based directly on 
theories of social exchange propounded by Thibaut & Kelley, Homans and Blau, whose influential books 
were published between 1958 and 1965. 

The author painstakingly follows chronological order as he progresses from a description of Thibaut & 
Kelley’s game theoretical approach through Homans’ deceptively simple economic propositions, to Blau’s 
scholarly examination of exchange relations ın organizations. Critics of the theones are given generous 
treatment, and this gives rise to some interesting discussion of general questions of method and theory in the 
social sciences. 

One main problem with the book lies in Chadwick-Jones’ fidelity to the aims and intentions of each of 
these writers and their followers, and he 1s impatient of those critics who expect more from them than they 
attempted. In the end, he is happy to be able to demonstrate that, indeed, exchange theory has ‘paid off’ in 
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social psychology. Exchange theory has spawned a multitude of studies, and this is regarded by the author 
as sufficient justification for the whole movement, despite weaknesses in the various theoretical positions. As 
he states, exchange theory is based on analogues from reinforcement and economic theories. This creates 
fundamental ambiguities, as in the case of Thibaut & Kelley, who, as Chadwick-Jones rightly suggests, deny 
‘the stricter game theory postulates while at the same time they accept a general maximizing assumption 
from which logically the stricter postulates could in the end be derived’ (p. 139) 

Some interesting discussion in the concluding chapters provides two pointers to the future. Rapprochement 
is already evident between exchange theory and cognitive approaches ın social psychology, as, for example, 
between Homans’ most productive notion of distributive justice and Heider’s balance theory, and in this 
connection it may be observed that readers (especially from the USA) will be surprised by the somewhat 
scanty account of empirical work on ‘equity theory’, which has of late been exchange theory’s most 
clear-cut success. This development is, to some extent, opposed to the tendency for social exchange theory 
to assume greater importance in sociology and social anthropology. The essentially individualistic approaches 
of Homans, and of Thibaut & Kelley, take the dyad as the unit of analysis, and they both suggest that 
appropriate, balanced and essentially harmonious relationships will prevail. Blau, on the other hand, 
emphasizes that in organized social life and in intergroup relationships, exchange has a more wayward, 
conflictful and competitive cast. At this point one regrets that this volume was apparently written too late to 
make reference to Peter Ekeh’s publication Social Exchange Theory which centres on the distinction 
between ‘restricted’ (dyadic) and ‘generalized’ (collectively organized) exchange. The elaboration of such 
general distinctions might have helped integrate the carefully documented studies, viewpoints and arguments 
which characterise this very useful book, besides more effectively demonstrating the wider context in which 
exchange explanations may operate. 

The author is, however, to be congratulated on producing a volume which does enable stock to be taken 
of the most important movement experimental social psychology has generated since the field theoretical and 
subsequent cognitive approaches inspired by Lewin and his followers. As such, it has no serious competitor 
G. M. STEPHENSON 


EKEH, PETER (1974). Social Exchange Theory, 
London: Heinemann. 


Heredity, Environment and Personality: A Study of 850 Sets of Twins. By J. C. Loehlin & R C Nichols. 
Austin: University of Texas. 1976. Pp. xii + 202. $9.00. 


Just as one of the aftermaths of the atomic bombs which ended Japanese resistance in World War II was a 
massive series of investigations in the 1950s by American geneticists into the effects of radioactive fall-out, 
which increased substantially our knowledge of many phenotypes, including behavioural ones, the advent of 
the Russian sputnik in 1957 was a prime factor in initiating a searching reconsideration of the bases of 
scientific training in the United States. The National Merit Scholarship Corporation was established and 
among its projects was one devoted to seeking out talent among the American high school students. In this 
way a large body of data was amassed and the authors of this book have capitalized on it in using it to 
ascertain and then to compare and contrast a sample of 850 same sex pairs of twins, 514 of them identical 
(monozygotic) and 336 fraternal (dizygotic). This they achieved by postal questionnaires, eliciting a response 
rate of no less than 79 per cent. The measures used included data from the National Merit Scholarship 
Qualifying Test - employed in the preliminary survey from which the twins were identified and which yielded 
data on cognitive abilities and achievements - the CPI (California Personality Inventory), and the Holland 
Vocation Preference Inventory. In addition, specially designed questionnaires were used to diagnose zygosity, 
to explore each twin’s interests and perceptions of his or her environment and to ascertain a parent’s view 
of it. 

The investigation of the relatively large sample thus obtained follows the classic pattern of twin analysis, 
and the collaboration involving J. C. Loehlin, well-known for his meticulous contribution to the analysis of 
human psychogenetic phenotypes — witness those in the important book on racial differences and intelligence. 
co-authored with G. Lindzey and J. N. Spuhler — guarantees a thoroughgoing, open-minded and keenly 
analytical approach to the available data. 

But the conclusions the authors arrive at are strangely muted. Not unexpectedly they find that 
monozygotics are in general more alike than dizygotics in respect of ability, personality and vocational and 
other interests. Typical summary (intraclass) correlations were 0-86 for monozygotics, compared with 0-62 
for dizygotics for general ability, and 0-50 compared with 0-28 for the personality inventory scales and 0-37 
and 0-20 for measures of goals, ideals and vocational interests. Such figures are in keeping with many of the 
previous reports in the literature and contemporary interest lies in the extent to which they can be applied in 
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the analysis of the multiple factors which undoubtedly give rise to them. Clearly both heredity and 
environment are involved and it is now possible to do more merely than to partition such variation into the 
appropriate contribution of each. Loehlin & Nichols are perhaps better placed to explore the subtle nuances 
of environmental causation since their data suggest quite strongly that the twins were on average not 
differentially treated as children even though there is evidence, both from parents and from the twins 
themselves that identical twins encountered more similar early experience than did dizygotic pairs. However 
the data indicate that these and other environmental influences are extraordinarily uniform in their effect 
both across the different characteristics measured and across the two sexes included in the sample. 
Environmental effects appear to be acting almost at random in these data and the authors explore the various 
possibilities which this surprising finding throws up. 

On the genetic side the analysis is, frankly, disappointing. Despite the limited nature of the design, 
involving merely the two biological sorts of twins, monozygotic and dizygotic, all reared together in the 
homes of their biological parents, with no data from, for example, other familial relationships or separated 
individuals, the analyses presented have not been pushed as far as the data do in fact allow. Certainly 
limiting assumptions in the analysis must be made, but biometrical genetical methods allow the testing of 
such assumptions as the absence of variation due to nonadditivity of genetic effects, such as dominance and 
epistasis, and the incidence of assortative mating (like marrying like). They can be tested during the course 
of the analysis and their effects, if any, can be evaluated in assessing the implications. These authors present 
us with no more than a most tentative calculation of one form of the heritability co-efficient for personality 
measures which they estimate to be of the order of 50 per cent — and none at all for cognitive ability. 
Considerations relating to genetic architecture and gene action generally are eschewed entirely. A pity, since 
they might allow speculations regarding the history of the kind of natural selection — directional, stabilizing 
or disruptive ~ to which the behaviour underlying the characteristics measured had been subjected during the 
course of man’s evolution. To this reviewer, the failure to push the analysis of the most interesting data 
collected to the limits which the sophisticated methods now available allow is a disappointing feature of an 
otherwise interesting volume. : 

The book is elegantly produced, though a format which mvolves omitting page numbers in order to 
accommodate tables, often for several pages in succession, does not make for easy cross-referencing. 
Paradoxically, perhaps, a principle merit of the book lies in the appendices which comprise over half the 
contents. In this respect these authors follow Jensen’s admonition (in his article scrutinizing Burt’s published 
correlational data) that the provenance of data on behavioural inheritance should be carefully documented. It 
is good to find them published this way in extenso in these appendices. Here the interested reader who is 
disposed to try his hand at psychogenetical analysis with a view to seeing if he can take matters further than 
Loehlin & Nichols or, if he will, to disprove the relatively modest conclusions they arrive at has free rein. 
For not only are the questionnaires used reproduced as given, but the data from both the twin questionnaire 
and the parent questionnaire are reported in detail for each and every vanable. This rich mine of information 
largely counterbalances the paucity of the analyses referred to earlier but cannot, of course, overcome the 
basic limitations of the design used. Nevertheless, the data will be of considerable importance in the future, 
it cannot be doubted, and may, indeed, prove to be the most important aspect of this important work. 

P. L. BROADHURST 


Biology of Play. Edited by Barbara Tizard & David Harvey. London: Spastics International Medical 
Publication. London: Heinemann Medical Books. 1977. Pp. iv + 217. £6.50. 


A characteristically elegant introduction by Professor Bruner provides some links for this collection of 
conference papers. Such publications are liable to lack unity of presentation, aims, methods and even 
assumptions. Play activities can, of course, range over almost all voluntary behaviour by humans and many 
other mammals. There are also fashions in attitude towards the ‘value of play’ as educational or therapeutic 
tools. Examples of these, reviews of certain areas of study and some reports of experimental and 
observational investigations can be found in this collection. 

Some papers are excellent. Catherine Garvey’s presentation of her work on play with language in the 
context of speech-act theory can be singled out for its lucidity and coherence. She gives a concise 
introduction, and examples of play with sounds, word structure. incongruous ‘funny’ names and agreed 
violations of normal contextual expectations between children. The most interesting aspects of her work, 
more fully presented elsewhere, are the variation in the prosodic structure of utterances and ‘ntual’ 
repetitions with variations between play-partners that mark off playful from informative discourse. The term 
‘non-lteral’ which she uses for this is probably somewhat overinclusive. But the documentation, hitherto 
mainly by students of animal behaviour, of the criteria and signals that characterize and delimit play-bouts in 
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many species are some of the most enlightening of observations about the seemingly paradoxical nature of 
play activities. Kathy Sylva’s study is of interest as one of the few head-on experimental onslaughts on the 
vexed question of ‘use’. She compares construction of a stick by means of clamps to rake in a toy-reward, 
by children instructed either to first observe an adult solve the problem, or to ‘play’ with the materials, or 
watch an adult fix an irrelevant clamp. The latter were worst. ‘Observers’ and ‘players’ did not differ in the 
number of correct solutions or time. More ‘observers’ produced immediate correct solutions but ‘players’ 
needed fewer hints from adults, and gave up less easily. The study raises many questions; but this is a merit 
Dina Feitelbaum’s competent survey of cross-cultural studies shows that imaginative play differs sharply in 
amount and level between societies. Deborah Rosenblatt contributes to findings that ‘appropriate’ toy-play 
and imaginative games increase and become more complex in parallel with developments in other cognitive 
and linguistic skills. There is a tendency to infer too much from parallels. But she is not alone in this. Peter 
Smith reports further observations that middle class children play imaginative games more than working 
class and disadvantaged children, that the latter will play them more if encouraged to do so, and that boys 
use rough and tumble play more than girls. The main conclusion that not all play is of the same kind has 
been documented elsewhere. One wishes only that these distinctions were now put to use to formulate 
questions about relevant factors for study, and in making expositions less confusing. The finding by Judy 
Dunn and Carol Wooding that the mother’s participation in an activity can lengthen the infant’s current and 
subsequent attention span is interesting. Whether this requires a specific ‘play’ context is less obvious, 
unless the term is to be applied to any relaxed ‘warm’ interchange. Lilyan White contributes a brief survey 
on play in animals, and some observations about who plays with whom among young rhesus monkeys Kay 
Mogford describes differences in play by physically and mentally handicapped children. Alex Kalverboer 
demonstrates ably how different categories of play can be used to obtain quantitative data on neurologically 
disordered children who are not testable otherwise. Provision of playgrounds for urban Swedish housing 
estates, and of activities for children in British hospitals are surveyed by Pia Bjdrkfid-chu, and by Lindquist, 
Lind and Harvey respectively. Arnold Bentovim describes a form of play therapy. Finally, there 1s a spirited 
attack by Barbara Tizard on the idea that ‘play’ amid a plethora of toys and a minimum of adult 
participation and direction 1s essential or necessarily beneficial for learning. Apparently, this particular 
‘progressive idea’ is not confined to the lunatic fringe of the fashion-conscious or incompetent. The picture 
of children ın nursery schools throughout the country left to romp or shovel sand all day, and of teachers 
too brainwashed to know that habits of sharing and cooperation, and advances in practical and intellectual 
skills and knowledge require adult help, seems a little extreme. But her findings must be taken seriously. The 
moral for psychologists is to check practical implications of their findings carefully before drawing inferences 
for consumption by others, and to broadcast precisely what the evidence is for any of the educational 
fashions. It is never quite clear who are the arbiters of these. One suspects that they have at least as much 
to do with changing social aims and attitudes and economic conditions as with garbled versions of 
ill-established theories. But these problems go far beyond the topic of ‘play’. 

The collection is presumably intended for practitioners. The apparent lack of consensus about what 1s 
being studied is therefore unfortunate It is also quite unnecessary. It really is not useful any longer for 
almost every contributor to discuss or provide slightly different global definitions; or to speculate, assert, 
deny or confess to apologetic bewilderment about ‘functions’. Play activities do not fit into the usual system 
of classification whereby behaviours are defined in terms of obvious end-states and evident functions. The 
reason is simple. For these activities end-states and functions are typically not obvious. They are among 
questions requiring study. ‘Play’ ıs not unique in this. But classification systems are matters of convenience 
and global definitions are largely irrelevant in any case. Characteristics by which behaviours can be 
distinguished operationally are far more important. Many of these we do now have for play activities. I vote 
that a full list of known characteristics, and of categories of play to which contributors can refer, be given - 
once — in any future publication of the topic. (Martin Bax’s gallant attempt here ignores too much recent 
work.) It would reduce confusion in the reader and pinpoint relevant questions. I have implied that the wide 
range of study papers in this collection 1s not an unqualified blessing, but it could make selective reading 
quite rewarding for a variety of people interested in ‘play’ for practical reasons. 

SUSANNA MILLAR 


Homo Loquens: Man as a Talking Animal. By Dennis Fry. Cambridge: Cambridge University Press 1977. Pp. 
177. Cloth, £5.95; paper £1.95. 


The task of the reviewer of popularizing books is never an easy one. The gulf between the Scylla of 
academic nitpicking and the Charybdis of the critical suspension of disbelief ıs narrow indeed. This difficulty 
is compounded when, as in the present case, the author’s generally lucid and informative survey chapters are 
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buffered by pages in which the tone on occasion almost reaches one of moral homily. The last chapter, 
entitled ‘Thinking, feeling and speaking’, marks the climax in this respect. Some not particularly original or 
insightful remarks on the difficult question of the relation between language and thought lead into a few 
paragraphs on the purpose and role of speech in which the author manages to bring in, among other things, 
juvenile deliquency and brainwashing, the whole being topped off by the retelling of a Middle Eastern 
folk-tale and the concluding advice that ‘it behoves us to take stock and to try to see how we can minimize 
the drawbacks that have accumulated around our facility for verbalizing and to discover in what directions 
there may lie possibilities of progress for homo loquens’ (p. 169). 

If we leave this dubious sermonizing on one side, however, the bulk of the book provides a welcome 
introduction to the study of speech as opposed to language. A general chapter, ‘Speech as brain-work’, sets 
the scene and the tone by introducing some fundamental linguistic concepts and by showing how speech is a 
complex integrative activity at the level of the brain. This ‘cybernetic’ approach 1s rightly emphasized in 
various places throughout the book. I disagree, however, with his claim that ‘we make use all the time of 
multi-morpheme words like easier, hopelessness, strikingly, which are built up from the several morphemes 
rather than being listed complete in the brain dictionary’ (p. 11), a claim which goes against the results of a 
growing body of research in linguistics and psychology. 

The subsequent chapters of the book may be divided into three sections: (i) Phonetics ~ chapters 3 to 5 
offer a neat encapsulation of present work 1n and attitudes to the study of linguistic sounds from the 
articulatory, acoustic and auditory points of view. (ii) Speech and information — the two chapters on 
redundancy and feedback respectively are to my mind the best in the book and give admirably clear and 
concise accounts of two central notions in language study. (iii) Psychology and pathology of speech — three 
chapters providing a round-up of the available knowledge in the acquisition of language, and in language 
disorders ranging from lisping and stammering to complete aphasia. 

In general, Professor Fry reveals an enviable ability to synthesize the results of often highly complex and 
technical research into a readable narrative well suited to a general audience. The book is thus a useful one 
to recommend to prospective students wishing to gain some idea what is involved in the study of speech, 
and also to beginning students as a means of providing them with an initial overview of the discipline upon 
which they are embarking. As I indicated in my opening paragraph, any recommendation should go 
hand-in-hand with a warning that not everyone will, nor need, share the author’s ethical interpretation of the 
ideas and research he reports on. 

NIGEL VINCENT 


The Psychology of Rigorous Humanism. By Joseph F. Rychlak. New York: Wiley 1977. Pp. xi + 547. £14.00. 


The popular stereotype of the humanistic approach in psychology has it that it is inevitably woolly and 
soft-headed even though it may be dealing with important matters. In practice there is nothing intrinsic in the 
humanistic approach that prevents it being trivial and equally it can be systematic, logical and self-critical. 

Rychlak’s thoroughly workmanlike volume propounds the terms of a rigorous approach for humanistic 
psychology. He sets out the philosophical assumptions, the favoured methods and the scientific habits of 
what he likes to call a ‘logical learning theory’, that is a learning theory which is humanistic rather than 
mechanistic and which accepts purpose and meaning as central characteristics of human behaviour. ~ 

Rychlak crystallizes his long and detailed exposition into a series of ‘tenets of a psychology of rigorous 
humanism’. These include a recognition that theory and method are somewhat different enterprises and that - 
the former must not be simply read backwards from the latter. Research experimentation on human beings is” 
a relationship between the two identical organisms each of whom has a ‘say’ in the outcome. The principle 
of parsimony dictates that any descnptive formulation applicable to the experimenter must be applicable to 
the subject and vice versa. (It is interesting to note that the acceptance or rejection of the principle of 
reflexivity seems increasingly to divide humanist from mechanist psychologists.) The notion of a ‘variable’ 
should be confined entirely to methodological discussion and removed entirely from theoretical discussion. 
(This would break us of the bad habit of concretizing our favoured notions into ‘variables’ and thereby 
bullying others into accepting them.) Psychological theory should readmit formal and final cause 
constructions as legitimate explorations in their own right. (Rychlak is here arguing that we should bring 
back teleological descriptions of behaviour which scientists - for good reasons then — threw out in the [7th 
century.) We should distinguish between humanistic and humanitarian approaches in psychology. Humanism 
is, in Rychlak's view, a purely technical aspect of psychological theories. Humanists do not have to be 
kindly disposed towards their fellow men, any more than behaviouristic psychologists have to be cruelly 
disposed towards them 

The book is well written with only a modest helping of pseudo-technological American coinages (though it 
does introduce ‘entitized’, for God’s sake). 
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Perhaps Rychlak’s most convincing proof that humanistic psychology can be rigorous lies in the admirably 
logical and systematic way in which he has written this book. 
D. BANNISTER 


The Philosophy of Mind. Edited by Jonathan Glover. London: Oxford University Press. 1977. 
Pp. 170. £1.50. 


The Philosophy of Mind is the latest addition to the Oxford Readings in Philosophy series which aims to make 
accessible to students and to the general reader articles tucked away in journals and unpopular books. It has 
the merit of containing; out of ten papers, three which should excite the interest of undergraduates, both 
those in philosophy and those in psychology who have a taste for speculation. All three draw on scientific 
results, or on the fantasies of science fiction. 

Thomas Nagel in his ‘Brain bisection and the unity of consciousness’ begins with an account of Sperry’s 
brain bisection experiments which seem to be interpretable in two ways. The individual with a split brain 
may be thought of as having one or two minds, for the disconnected right hemisphere (‘which cannot talk’) 
ought not to be thought unconscious merely because mute. On the other hand we are tempted to invoke the 
two-minds hypothesis only in describing artificial experimental effects, leaving us the problem of explaining 
their apparent mental unity during the rest of everyday life. Nagel thinks there is no whole number of minds 
that Sperry's patients, and perhaps the rest of us, must have. He ends with a doubt. Future scientific 


va advances might make the idea of a unitary person seem quaint, but then we might also find ourselves unable 


to relinquish the idea. Perhaps, Nagel thinks, there is a necessary limit to our understanding of 
consciousness. A 

~ It is difficult to do justice to Bernard Williams’ entertaining paper on ‘The self and the future’. Williams 
imagines a macabre experiment described in two different ways. In one description two persons A and B 
enter a machine which interchanges their memories and other ‘brain information’, leaving their bodies 
unchanged. The A-person-body, the unchanged body of the person originally A, receives the memories of B 
and vice versa. Williams’ question is: which of the two bodies is the body of A and which the body of B 
after the experiment? Williams imagines the experimenter asking A and B which person they would prefer 
rewarded and which punished after the experiment. A should rationally prefer the person associated with the 
B-person-body to get the reward, and vice versa. This conclusion suggests that a person is associated 
essentially with his brain information rather than with his body. 

Described differently this kind of case suggests the opposite. Imagine a man in the power of a torturer 
who informs him that tomorrow he will be tortured, but also relieved of his memory before the torture. It 
would seem rational for the man to face the morrow with fear and forbodings. The story may be filled out in 
a number of ways. The man may have induced in him a replacement memory; secondly, this new memory 
may be the memory of some other (unfortunate 7) man; thirdly, the other man may acquire the prisoner's 
original memory. Yet in all these cases we should expect the prisoner to experience rational terror. And this 
suggests that bodily continuity rather than memory is constitutive of personal identity. Williams ends with a 
confession. He is disturbed that he is not clear which options A and B ought to take. 

Derek Parfit in ‘Personal identity’ argues that such questions about personal identity need not have unique 
answers, though it is our strong prejudice that they should have, and that we should come to view them as 
being unimportant. Personal identity is a matter of both bodily and psychological continuity, both of which 
are a matter of degree. There is no further entity, ‘the self’, over and above these relative continuties. The 
conundrums of personal identity arise from a mistaken belief in the self. Parfit ends with a hope, that the 
truth (Parfit’s view) may dispel bad ideas, like egoism, which depend on a false view of the self. 

The remaining papers in the collection are less striking. B A. Farrell is concerned with the truth-criteria of 
remarks made by psychoanalysts in their therapeutic practice. He argues that psychoanalytical diagnoses 
ought to be construed pragmatically, as meant to have an effect on the patient, and not as true or false. 
Patrick Gardner attempts to show that self-deception is too subtle and complex a concept to fit any one 
psychological model. G. A. Cohen argues, in ‘Beliefs and roles’ that a person cannot justify his beliefs by 
pointing to the social role he plays. 

Stuart Hampshire in ‘Feeling and expression’ opposes both dualism and logical behaviourism (the view 
that statements about mental states are logically equivalent to complex statements about behaviour) by 

“‘arguing that though there are mental states they may in many cases be identified only by the behaviour they 
cause or induce. Donald Davidson, in a paper that threatens to become much-anthologized, argues that 
though mental and physical states may be identical, there need be no causal laws governing mental states 
even though there are causal laws governing the corresponding physical states 

J. A. Deutsch argues that the psychologist may conjecture explanatory mechanisms of mind without 
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feeling compelled to identify these with neurophysiological structures. This is consistent with the view 
expressed by Hilary Putnam in ‘The mental life of some machines’, the most technically demanding paper in 
the collection, that to know about a person’s mental states is to know about his functional organization 
(loosely his program), which is distinct from his physicochemical structure and from his disposition to 
particular behaviours. Putnam takes this conclusion to be a denial of both materialism and of logical 
behaviourism. 

PETER GIBBINS 


Man in Urban Environments. Edited by G. A. Harrison & J B. Gibson. London. Oxford University Press 
1976. Pp 367. £7.50. 


The social and psychological consequences of urbanization have long been a source of fascination and 
concern to scholars in several disciplines. Unfortunately the outcome of this interest has often been 
inadequate. Louis Wirth’s gloomy reflections on urbanism as a way of life were shown to be very partial, in 
both a spatial and temporal sense, and sociologists have since been understandably reluctant to embrace any 
kind of place-based determinism. Among planners, a disillusionment with utopian social engineering has led 
to a withdrawal into a fashion for consultation and participation which has as yet produced few positive 
manifestations. Geographers have for 50 years fought shy of an environmental determinism which at one 
time brought their discipline into grave disrepute. It is refreshing, therefore, that scientists from psychology, 
medicine and biochemistry have recently spearheaded a new interdisciplinary approach to the study of the 
human consequences of urban agglomeration. 

In America this new interest in the study of human milieux has led to a great deal of stimulating 
literature, and the Social Ecology departments at Stanford and Irvine are institutional monuments to its 
seriousness. More recently, ripples have washed across the Atlantic. and Harrison & Gibson's Man in Urban 
Environments is a product of this new enthusiasm for issues which have a long, but to a large extent lost, 
pedigree. In the introduction, the editors clearly spell out their aims, aims which differ little from those of 
their American counterparts. Their purpose has been to assemble, in a broadly interdisciplinary way, as 
much evidence as is available concerning the effects on human health, efficiency and social response of as 
many facets of urban living as are seen to be important. Density, design, noise, light pollution and mobility 
are considered as causal influences, for urban life is seen to differ profoundly from rural or pre-urban ın all 
of these respects. Infant mortality, maturation, heart disease, obesity, workrate, drug use, mental health and 
suicide are among the problems seen to be accentuated by urban living. A vast body of evidence is sifted, 
with particular attention being paid to intervening mental and biochemical processes such as anxiety, 
aggression and hormone release. 

The need for books of this kind is obvious, for the field is grossly underresearched; the plan of the book, 
as outlined by the editors in the introduction and as summarized above is good; it 1s a pity, therefore, that 
the outcome is so disappointing. This 1s a cheaply produced book marketed at a very high price. The various 
contributions, although written specifically for the editonal brief, lack cohesion and are often extremely 
repetitive Why do we need so many reminders, for instance, that urbanization is a phenomenon affecting 
the whole world at an increasing rate? Many of the chapters have nothing to do specifically with urban 
places. Surely ıt would not have been too onerous an editorial task to keep their contributors to heel in this 
respect? The chapter on traffic problems, on the other hand, though clearly about towns 1s nothing to do with 
people, being merely a treatise on traffic engineering problems The shortcomings of this book must be laid 
squarely at the editorial door. Britain deserves a better introduction to Social Ecology 
HENRY IRVING 


Music and the Brain. Edited by Macdonald Cntchley & R. A. Henson. London: Heinemann Medical Books. 
1977. £11.50. 


The literature on the psychology of music is not only sparse and scattered throughout a number of different 
specialist journals, but also suffers from the disadvantage that a significant proportion of it has been 
produced by writers for whom the area is a part-time interest. Therefore any attempt to produce a collection 
of papers in the area must be of interest. The underlying theme for this collection is the neurological aspects 
of musical experience and was stimulated by the Danube Symposium on Neurology which met in Vienna in 
1972 to consider the same area. The book ıs intended for a wide audience, not only for neurologists, and in 
fact claims to be readable by ‘all of those who are interested in more than just the composers notes’, and 
certainly there is a good deal which can be read by the layman. 

With respect to structure, there are two distinct parts to the book The first 16 chapters are concerned with 
the various systems which are involved in musical functioning and the last eight chapters deal mainly with 
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` the effects of neural deficiencies on musical performance. The contributors come from a wide range of 


disciplines. Anatomists, physiologists, pathologists, psychologists and otologists are all represented. The very 
fact that contributors are from such different areas tends to produce isolated pools of knowledge which are 
rather unrelated, so that it is not possible, with some exceptions, to compare one writer’s attitude to a 
particular topic with another. One such exception is the topic of lateralization. Although not all these 
discussions occur successively, any comparison of them soon illustrates that we are far from understanding 
the role of the two cerebral hemispheres in musical functioning. Again due to the large number of 
contributors, each chapter must inevitably be rather brief and this often leaves the impression that the topic 
has merely been introduced. It would have been of more interest in these chapters if the authors had paid 
more attention to the problems of acquiring and dealing with data which are inevitable in this particular field. 
Some chapters tend to be rather factual in their content and there is relatively little interpretation of the facts 
and discussion of the material presented, with the result that these sections tend to read rather like 
summaries of what is known at present with no mention of how such knowledge is attained. Another minor 
complaint is that some chapters contain no references, which is rather frustrating if the reader wishes to find 
out more about the field which has been discussed. 

. Although there are a number of deficiencies in Music and the Brain, it is nevertheless well worth reading 
both by the general reader interested in music and also by the psychologist. In fact even at £11.50 it should 
find a place on the bookshelf of any psychologist interested in the psychology of music, but ıt is a pity there 
is no paperback version to bring it within the price range of students. 

ANDREW ROSTRON 


Introduction to the Psychology of Hearing. By B. C. J. Moore. London: Macmillan. 1977. Cloth, £10.00; 
paper, £4.95. 


Any books on hearing which are readily understandable by psychologists, without any extensive background 
knowledge of topics such as physics and electronics, are particularly welcome, since readers tend to 
underestimate the extent to which they can understand the general issues involved. This introduction is 
especially welcome since the author is careful to explain the necessary technical concepts in a clear and 
easily understood manner so that the reader becomes gradually and gently introduced to the terminology 
necessary. 

The book starts with a simple description of the nature of the auditory stimulus and goes on to discuss 
how the auditory system responds to the physical stimulus, starting at the periphery and concluding with a 
survey of neural responses at the higher levels of the auditory system. The next chapter considers how the 
ear copes with the wide range of sound intensities encountered in the everyday world and includes a section 
on adaptation and fatigue, although more on recruitment and permanent threshold shift would have been of 
interest to the non-specialist. The extent to which sounds affect the perception of other sounds is then 
discussed in a chapter on frequency analysis and (primarily simultaneous) masking which terminates with yet 
another account of signal detection theory. 

Pitch perception and auditory pattern perception are then dealt with in a chapter, much of which is 
devoted to, as one would expect from this particular author, the evidence for pattern recognition and 
temporal models of pitch perception. There is also some consideration of the perception of music, which 
here means musical intervals, absolute pitch, and that curious term tone deafness. Perhaps the most 
interesting part of this chapter 1s the section on temporal patterning which indicates the richness of 
psychological effects which can be obtained from varying the temporal structure of stimulus sequences and 
which do not as yet seem to have been studied in much detail by experimental psychologists. Indeed the 
effects quéstion the extent to which the traditional psychophysical approach really provides information 
about hearing, which normally occurs with familiar sounds and with distinct goals. In contrast the artificial 
psychophysical situation is a really strange acoustic environment and may well not give a true picture of 
hearing in realistic situations. A further confusing variable would seem to be the extent to which auditory 
imagery can influence listeners’ perceptions, a factor which is not taken into account by the traditional 
psychophysical approach. Considering that some composers can produce works without hearing them 
played, it is plausible that the sounds heard by a number of people are profoundly influenced by such 
auditory imagery 

The last chapter dealing with primarily laboratory stimuli, is a chapter on the perception of auditory space, 
which includes for those interested more in applied aspects of psychology, a useful summary of some of the 
work on obstacle detection by the blind. The final two chapters also have a distinctly applied flavour, one 1s 
on speech perception and the other on new development in auditory research. In the former the author 
restricts himself to a survey of how acoustic cues are processed in the overall speech waveform, and while 
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such a restriction is reasonable, the limitations of this experimental approach in which all the stimuli are 
generated with speech synthesizers are not really considered. For example there is no mention that use of 
particular cues in a laboratory situation does not necessarily mean the same cues are also used when 
listening to normal speech. Again, some attention might have been drawn to the difference between 
synthesized and natural speech. Another issue that might have been discussed when dealing with the motor 
theory of speech perception are the implications of this theory for the acquisition of normal speech patterns 
by the congenitally deaf. Possibly a more surprising omission is that nowhere in the book is there any 
detailed consideration of the processes involved in human noise production This is perhaps rather 
unfortunate as it implies that there is no important link between auditory production and reception. While it 
is true that you don’t have to be able to speak to understand language, in view of the importance of feedback 
in vocal skills it is very likely that there is considerable interaction between the hearing and production 
processes in the cases of both speech and music. 

One of the most interesting of the new developments is the section on auditory implants, where the 
acoustic signal is injected directly to the auditory nerve. Even here it would be good to hear of some of the 
disadvantages of the approach. For example it is at least plausible to suppose that for somebody who 1s a 
proficient lip and sign reader the introduction of new and confusing auditory information may reduce their 
ability to communicate rather than augment it. However the general feeling about this final chapter 1s that 
there could have been more of all of it and that the topics discussed tend to whet the appetite for more 
information. For instance some space might have been devoted to attempts to convert the speech signal mto 
forms which are still perceivable by the deaf. 

Despite the reservations mentioned above Brian Moore’s contribution to the literature on hearing 1s a 
useful and valuable one and should provide considerable insight into the area for any psychologist interested 
in the many problems, both theoretical and applied, involved. 

ANDREW ROSTRON 


Moral Development and Behavior. Theory, Research, and Social Issues. Edited by Thomas Lickona. New 
York: Rinehart & Winston. 1976. Pp. xiv + 430. £10.25. 


Thomas Lickona has commissioned 20 essays from a distinguished group of scholars in the field of moral 
behaviour, including Kohlberg, Aronfreed, Hoffman and Bronfenbrenner. The time and care invested in his 
editorial work has clearly paid off: this volume makes a refreshing change from the hastily assembled 
collections we have grown accustomed to. The book takes a broad ‘integrative’ approach, and is in four 
parts. Lickona structures the field around eight ‘basic questions confronting a science of morality’ in his 
Introduction (Part I), and the other three parts deal with ‘Theoretical perspectives’, ‘Research’ and 
‘Morality and social issues’ respectively. 

Whilst an interdisciplinary approach is probably essential in a reference work like this, it brings its own 
disadvantages. In being scrupulously fair to all shades of theoretical opinion, the editor makes life difficult 
for us. We move from Eysenck’s ‘biology of morality’ (chapter 3), in which conscience is reduced to 
conditioned inhibitions, through Burton’s social learning theory (chapter 10), Aronfreed’s cognitive learning 
theory (chapter 3), the Mischels’ cognitive social-learning approach (chapter 5) and Simpson's 
‘cognitive-affective-conative’ synthesis (chapter 9), to the Piaget-based cognitive-developmental approach of 
Kohlberg (chapter 2) and Hoffman (chapter 7). Lickona’s review of research on Piaget's theory of moral 
development (chapter 12) suggests that his own sympathies lie at this latter end of the spectrum. Kohlberg 
presents an updated account of his theory of moral stages of development as well as a welcome outline of 
some of his techniques for the assessment of levels of moral judgement, and Rest, in chapter 11, develops 
these into his own Defining Issues Test. 

This is an excellent and well-balanced volume which contains several important papers. It is meticulously 
presented, and is augmented by biographies of the contributors as well as editorial footnotes It can be 
strongly recommended as a comprehensive and up-to-date reference text. 

DAVID J. HARGREAVES 


Brain, Behaviour and Drugs. By D. M. Warburton. London: Wiley. 1975. Pp. x + 280. £6.95. 


This book has both strengths and weaknesses It represents a brave attempt to bridge the disciplines of 
psychology, neurochemistry and neuropharmacology and as such 1s organised around the relations between 
neurotransmitter systems in the brain and behaviour The book includes amongst others, chapters on 
‘Control of homeostatic motivation’, ‘The biochemical basis of mood’ and ‘Sleep and dreams’, ‘The 
biochemical basis for drug dependence learning and memory (two chapters) and ‘Anxiety and stress’. Its 
strength lies in the sound common-sense which the author exhibits when discussing interdisciplinary 
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approaches to behaviour analysis. This is particularly evident in the chapter on the control of attention, 
where Dr Warburton describes the work of his own and others on the role of cholinergic pathways in 
modulating attention in animals. Its weaknesses are found in the naivety of the neurochemistry and much of 
the pharmacology. Many if not most of the references to these fields were five or more years out of date at 
the time of publication and there are numerous examples of over simplification. For example, the important 
inhibitory transmitter y-amimobutyric acid is only mentioned once in passing, p-chlorophenylalanine 1s 
referred to as a relatively specific inhibitor for the synthesis of serotonin and chlorpromazine is dismissed as 
an a-adrenergic blocker. The chapters on clinical depression and schizophrenia are also dated and contain 
several assertions which have not stood the test of time. 

In spite of these criticisms the book is a valuable contribution to the neurobiological literature. There are 
not many scientists who have the courage to relate their speciality to other disciplines in the way that Dr 
Warburton has. The book possesses an index and is well produced and illustrated. 

R. RODKNIGHT 


Revision Notes on Psychiatry. By K.T Koshy London’ Hodder & Stoughton. 1977. Pp. vii + 148. £1.45. 


This book ıs published within the ‘Modern Nursing Series’ and is obviously aimed at students undertaking a 
psychiatric nurse training, although the foreword suggests its additional use for ‘social workers, occupational 
therapists, probation officers and lay people involved in the treatment of psychiatric patients. ..’. As the title 
implies, the book consists of a series of short notes relating to a moderately large number of headings which 
are arranged in alphabetical order. The headings themselves are an odd mixture of names of psychiatric 
disorders, methods of treatment and elements of nursing care and psychology, although the names of 
psychiatric disorders predominate. It is apparent on reading the book that the author adheres closely to the 
medical model and also leans toward a Freudian interpretation of psychiatnc iliness. Although notes on 
specific behaviour therapy techniques are included among methods of treating phobias, the main explanation 
(albeit very brief) is found under the heading ‘Token economy’. Incidentally, the term negative 
reinforcement ıs wrongly used by the author, but perhaps this is not surprising in view of the confusion over 
the use of the term, recently discussed by Green (1977). In general, the behavioural and the social 
psychiatric approaches to treatment, and the therapeutic community method of ward organization, are all 
badly presented by the author, who is a nurse tutor. 

I would not recommend the use of this book for nurses, and certainly not for other professionals and lay 
people in contact with psychiatric patients as mentioned above. There are many inaccuracies and distortions 
in the basic information derived from psychiatry and psychology, and also ın references to illnesses which 
have an underlying physical pathology. This is a pity, as in terms of the notes on methods used in 
psychiatric nursing, there is much that 1s useful within the book, although it is covered very superficially. 
Superficiality is inevitable in a book of this kind. The bibliography is reasonable, something which is still 
comparatively rare in books for nurses. 

Perhaps the book is misconceived. Psychiatric nursing 1s essentially concerned with relationships and to 
present the subject in the format of revision notes, which is a form best suited to the presentation of 
undisputed fact may be an impossible task. There is much factual material which nurses must know, but 
within psychiatry there are fewer incontrovertible facts than a reader of this book would suppose. 
MARGARET CLARKE 
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Psychology, although giving preference to reports 
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* consider the publication of review papers. 
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titled headings, 
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given should be checked for accuracy and must 
agree with mentions in the text. 

(d) Figures, i.e. diagrams, graphs or other 
illustrations, should be on separate 
numbered sequentially ‘Fig. 1’, etc., and each 
identified on the back with the author’s name 
and the title of the paper. They should be 
carefully drawn, larger than their intended size, 
suitable for photographic reproduction and clear 
when reduced in size. Special care is needed with 
symbols: correction at proof stage may not be 
possible. Lettering must not be put on the 
original drawing but upon a copy to guide the 
printer. Captions should be listed on a separate 
sheet. 

(e) Bibliographical references in the text should 
quote the author’s name and the date of 
publication thus: Bartlett (1953). They should be 
listed alphabetically by author at the end of the 


article according to the following format: 
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Factors. London: Harper & Row. 

Fraser, C. O. (1976). Cognitive strategies and 
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399-406. 


Particular care should be taken to ensure that 
references are accurate and complete. Where 
books are available in both hardback and paper- 
back please give references to both editions and 
publishers. For journal and other abbreviations 
use only those given in the B.P.S. booklet 
Suggestions to Authors, otherwise give all names 
in full. 

Q) SI Units must be used for all measurements, 
rounded off to practical values if appropriate, 
with the Imperial equivalent in parentheses. 

A guide to SI Units is given in the B.P.S. 
booklet Suggestions to Authors. 

(g) Supplementary data too extensive for publication 
may be deposited with the British Library 
Lending Division. Such material includes 
numerical data, computer programs, fuller 
details of case studies and experimental 
techniques. The material should be submitted to 
the editors together with the article, for 
simultaneous refereeing. Further details of the 
scheme are given in Bull. Br. psychol. Soc. (1977), 
30, February. Copies of Supplementary 
Publications may be obtained, at a cost of 5p a 
page (including postage), from British Library 
Lending Division, Boston Spa, Wetherby, 
Yorkshire, LS23 7BQ. 


5. Proofs are sent to authors for correction of print, 
but not for introduction of new or different material. 
They should be returned to the Press Editor, 
together with the typescript, as soon as possible. 
Fifty complimentary copies of each paper are 
supplied on request; further copies may be ordered 
when proofs are returned. 


6. Submission of a paper implies that it has not 
been published elsewhere. The author is responsible 
for getting written permission to publish lengthy 
quotations, illustrations, etc., of which he does not 
own the copyright. 


7. The tendency is growing for articles to be 
reproduced abroad without permission. To protect 
the interests of authors and journals the B.P.S. 
requires copyright to be assigned to the Society (by 
signing a form), on the express condition that 
authors may use their own material elsewhere at any 
time without permission. The author’s consent, and 
approval for a suggested fee, will be sought before 
applications to reproduce material are granted. 
Further details are given in the B.P.S. booklet 
Suggestions to Authors. 


- The British J eau of badoia 


Volume 69, Part 2, May 1978 


y 


ae. of processing: A critique _ `157 Michael W. Eysenck 


Levels of processing: A reply to Eysenck: > 171 Robert S. Lockhart and 
, ` Fergus I. M. Craik 
< Levels of processing: A reply to Lockhart 177 Michael W. Eysenck ` 
and Craik 
` “Evaluating the worth of Sai be 179 W. R. Crozier TES 


Graphic rating scales - How many categories? 185 Stuart.J. McKelvie - 


= Impulsivity/sociability and reinforcement i in 203 B. S. Gupta and Mohanjeet 


verbal operant conditioning Nagpal 
The validity ‘and reliability of self-report items 207 Stanley Coren and Clare - 
`+ for the measurement of lateral preference Porac , 
’ A note on’ the development of recall of spatial 213 J. M. Von Wright, P. Loikkanen 
‘ -location pene and P. Reijonen 
i “Role of symmetry in pattern reproduction 217 J..B. Deregowski 


‘The subliminal perception of movement and the 225 Peter Walker and R. R. Meyer 


7 ‘course of. autokinesis j i ; 
~ An intra‘cultuial: investigation of susceptibility 233 A. Ahluwalia: 


a 


to ‘perspective’ and ‘non-perspective’ spatial’ ; 5 pa 
‘illusions. i, ea ee 
: “Memory for prose:Quantitative analysis of 243 I.M. Cornish - 
uae recall components 
G: ' Level I and Level H abilities: Some theoretical 257 Kovai F. Jarman 
<a reinterpretations 
= Chiorpromazine and serial reaction =~ ` Zj L. Hartley, T. A and 
performance “ $ : J. Couper-Smartt 
Book reviews . ae 27 
Other publications received . ky 291 


e we 23. Wage: 


om z 


QERTAAL 


LIBRARY 





i ‘The British Poychological Society 1978 ` 


CAMBRIDGE UNIVERSITY PRESS v4 
The Pitt Building, Trumpington Street, Cambridge CB2 1RP . -: 
Bentley House, 200 Euston Road, London NWI 2DB. : 

32 East 57th Street, New York, N.Y. 10022 - 


Printed in Great Britain at the University Press, Cambridge’ 


tawa, Mn 


Volume 69 Part3 August1978 >- issn omer 


E a 


Published for The British Psychological Society . . 


The British Journal of 


Psychology 


CAMBRIDGE UNIVERSITY PRESS 





Editor: A. D. B. Clarke 


Editorial Board 
J. Brown I, Martin 
N. F. Dixon B. Tizard 


Papers should be submitted, in accordance with the Notes for Contributors on the inside back 
cover, to The Editor, Professor A. D. B. Clarke, Department of Psychology, The University, 
Hull HU6 7RX. 


Books for review should be sent to Dr D. C. Kendrick, Department of Psychology, The University, 
Hull HU6 7RX. 


‘ © The British Psychological Society 1978 . p 


-- Permissions 
For permission to reproduce material from The British Journal of Psychology, please apply to the 
London or New York Office of Cambridge University Press. 

ISI Tear Service, 325 Chestnut Street, Philadelphia, Pennsylvania 19106, USA, is authorized to 
supply single copies of separate articles for private use only. 


Subscriptions 


The British Journal of Psychology is published quarterly in February, May, August and November by 
Cambridge University Press, Orders should be sent to a bookseller or subscription agent or direct 
to Cambridge University Press, P.O. Box 92, London NW1 2DB; and in the USA and Canada to 
Cambridge University Press, 32 East 57th Street, New York, N.Y. 10022. 

Single parts cost £7.00 (US$15.50 in the USA and Canada) plus postage. Four parts form 
a volume. The subscription price (which includes postage) of Volume 69, 1978, is £23.00 net 
(US$52.00 in the USA and Canada). Copies of the journal for subscribers in the USA and Canada i 
are sent by air to New York to arrive with a minimum delay. Second class postage paid at New 
York, N.Y. 

Claims for missing issues will not be entertained unless made immediately upon receipt of the 
subsequent issue. 7 Í 


Br. J Psychol. (1978), 69, 295-303 Printed in Great Britain xv 295), iN 
EA 


TAS aa AR 
T Uan p 
è m > N < TE bse s > oa lew 
Factors influencing the response latencies of subnormal children in {= \ tats 
. . Ws “a / 
naming pictures a Aaa E er 
‘ Dir a 
Colin Elliott a ees 





The times taken to name 56 drawings of objects on five separate occasions were analysed for 21 ESN(M) and 
21 ESN(S) children, matched for picture-naming vocabulary. The ESN(S) group not only had a higher mean 
response latency but also showed greater inter- and intra-subject variance. 

Nine objects were selected whose names have a Thorndike-Lorge language frequency of 50 words per 
million or greater, and nine others were selected with a frequency of less than 50 words per million. Each 
object was drawn in two ways, one giving a two-dimensional outline with the addition of important detail, the 
other drawing also incorporating cues indicating the depth of the object. An analysis of variance of the 
children’s latencies in naming the selected 36 pictures of 18 objects over five trials indicated that the method 
of drawing had no effect upon naming latencies. Pictures with high-frequency names were named faster than 
those with lower frequency names, the ESN(S) group showing a greater rate of increase in naming latency 
for the lower frequency words than the ESN(M) children. Results were discussed in terms of the Oldfield 
and Lachman models of lexical memory storage and of the search processes required for the retrieval of 
names. 


In examining individual differences in the speed of naming objects, it is clear that at least two 
major components should be considered: (a) the time taken to identify the object perceptually, 
and (b) the time taken to search for its appropriate label. 

The perception of objects or pictorial representations of objects by retarded children has 
received little attention from researchers. As far as normal children are concerned, a few studies 
(e.g. Ryan & Schwartz, 1956; Fraisse & Elkin, 1963) have employed tachistoscopes to present 
different types of pictures to children. Pictures may, for example, be photographs, outline 
drawings or cartoons. Starting with very brief exposure times, too small for the child to identify 
the picture, exposure time is gradually increased until the child can identify it. Differences in 
visual exposure time between different types of picture may be taken as a measure of their 
relative ease of perception. In reviewing such evidence, Gibson (1969) concluded that 
discrimination of pictures is facilitated when the potential number of feature contrasts is 
maximally displayed. 

Few studies appear to have been conducted which investigate differences in naming latencies 
for different types of pictorial material. As was previously noted, one of the components of 
naming latency is the time required to search for the name of an object, and Oldfield & 
Wingfield (Oldfield & Wingfield, 1964, 1965; Oldfield, 1966) have suggested that this is related to 
the frequency of occurrence of its name in the language. In a situation where an individual is 
shown an object, or its pictorial representation, there are no contextual clues which could 
partially determine what his choice of label will be. If the names of objects were stored in 
memory in such a way that they all required the same number of operations or decisions to 
retrieve them, there would be no variance in response latencies between objects labelled by an 
individual, although there may be variance between individuals as a result of differences in 
decision-making latency. Oldfield (1966) and Oldfield & Wingfield (1964, 1965) hold the view that 
labels are classified and stored according to the frequency with which they are commonly 
encountered, the store being so arranged that the higher the frequency of occurrence of names 
in the language, the more quickly and easily accessible they are. 

In order to test this hypothesis, Oldfield & Wingfield (1964, 1965) reported an experiment 
investigating the response latencies of 12 normal adults in naming 26 shaded drawings of objects, 
projected successively on a screen. The names of the objects were graded in terms of their 
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frequency of occurrence in the language, as estimated in the Word List compiled by Thorndike 
& Lorge (1944). The results indicated a significant relationship between response latency and the 
logarithm of frequency of the name word (r=—0-8). This relationship held when the effects of 
length of word in terms of the number of syllables and the number of letters were partialled out. 
There was significant agreement in the rank order of the naming latencies between subjects, 
although there were also significant differences between subjects in terms of their mean speed in 
naming the pictures. Oldfield & Wingfield concluded that the results supported the theory that 
memory for object-names is organized on the basis of the frequency of occurrence of the names 
stored there. 

A number of studies have subsequently supported the theory, again using samples of normal 
adults (e.g. Briggs & Swanson, 1969; Swanson & Wickens, 1970) and Wingfield (1968), in a 
further experiment, concluded that ‘differences in naming latencies for common and rare objects 
are due primarily to differences in the time occupied in response selection, rather than in the 
time required for perceptual identification’ (p. 234). 

Carroll & White (1973) not only obtained the Thorndike—Lorge frequency of word names but 
also estimated the age at which the words would have been acquired in infancy by reference to 
developmental studies of children’s knowledge of words. They obtained the naming latencies of 
a group of normal adults for 94 pictures which are presented tachistoscopically. The results 
indicated that word frequency, age of acquisition and length of word all correlated significantly 
with naming latency. A multiple correlation (R) was calculated for the effect of these variables 
upon naming latency. The non-inclusion of age of acquisition significantly reduced R, but it was 
not reduced by the non-inclusion of either length of word or word frequency. Carroll & White 
concluded that age of acquisition is more important than word frequency in determining 
picture-naming latencies. 

More recently, Lachman (1973) and Lachman, Shaffer & Hennrikus (1974), in similar 
experiments on normal undergraduates, related naming latency to word frequency, age of 
acquisition and to lexical uncertainty (U). U was defined as a measure of the consensus among 
subjects in the names they produce for visual displays. Lachman found that naming latencies 
were significantly related to all three variables, U being the largest predictor. However, both 
word frequency and age of acquisition of names contributed independently to naming latency, 
and, indeed, all three variables were significantly correlated. 

It must be emphasized that the studies cited above were based on normal subjects. These 
findings provide general support for the contention of Oldfield & Wingfield that object names are 
stored in memory in such a way as to minimize retrieval times for certain words at the expense 
of longer times for others. 

No studies appear to have been conducted on subnormal subjects, in which effects of different 
word-retrieval latencies have been examined. However, the results from reaction time (RT) 
experiments suggest that response latencies in naming pictures may also tend to be longer and 
more variable as degree of retardation increases (Baumeister & Kellas, 1968a, b; Weaver & 
Ravaris, 1970). There is also evidence (Kellas, 1969) that both inter- and intra-subject RT 
variability becomes greater as degree of retardation increases. 

The evidence that speed of response decreases as a function of intelligence is supported in a 
recent study by Elliott & Murray (1977) which investigated children’s speed of problem-solving. 
Young children and those of low ability within each of four age levels solved easy block design 
problems significantly more slowly than older and abler children. On the basis of such evidence, 
therefore, a prediction can be made that response latencies in naming pictures will be longer for 
severely subnormal than for moderately subnormal children. 

What is more interesting, however, is the prediction of an interaction effect between degree of 
subnormality and word frequency. Following Lachman (1973), the lexical store can be 
conceptualized as a steady-state storage system, with the search algorithms necessary to access 
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its contents being relatively fixed and increasing in complexity as word frequency diminishes. If 

this is so, it can be predicted that, while severely subnormal individuals would be slower at 

naming high frequency words, they would require relatively more time for the more complex 

algorithms associated with the retrieval of low frequency words than would moderately 

subnormal individuals. Thus for a given decrease in frequency, the naming latencies of severely 

> subnormal individuals would be predicted to increase at a faster rate than those of moderately 
subnormal individuals. 

| The presence of main effects but the absence of an interaction in the predicted direction 

' between groups and word frequency may suggest that severely subnormal children are slower 

; either in visually decoding pictures or in transmitting names to the articulatory system, but that 

: their speed of access to the lexical store is similar. 


Method 
Subjects 


The terminology describing groups of retarded children differs between Britain and the USA. The groups of 
children reported in this paper are classified as moderately educationally subnormal (ESN(M)), or severely 
educationally subnormal (ESN(S)). The groups can be very approximately divided into those children with 
ratio IQs in the 50-75 range and those with ratio IQs below 50, respectively, and appear to correspond 
roughly to the American educable mentally retarded (EMR) and trainable mentally retarded (TMR) 
categories. 

The sample comprised 21 ESN(M) and 21 ESN(S) children who were attending day special schools in the 
North West of England. The children were matched in terms of performance on the English Picture 
Vocabulary Test (EPVT) for pre-school children (Brimer & Dunn, 1969). The test is similar to the Peabody 
Picture Vocabulary test and provides a measure of the child's comprehension vocabulary combined with the 
ability to recognize pictures. Approximately 200 children were initially tested with the EPVT. Children were 
only tested if they did not have any visual or auditory impairment as far as could be ascertained, they did 
not have any gross motor impairment and they were able to speak intelligibly. This testing procedure 
produced 53 ESN(M) and 30 ESN(S) children who all achieved a raw score of between 17 and 27 on the 
EPVT, i.e. their age scores were between 4 years and 5 years, 21 children being selected randomly from 
each of these groups. The chronological age (CA) range of the ESN(M) group was 8 years 3 months to 11 
years 9 months (mean CA 9-79 years). The CA range of the ESN(S) group was 9 years 2 months to 14 years 
9 months (mean CA 11-75 years). 


Apparatus 


A total of 56 black-and-white line drawings were executed by an artist. The objects selected for drawing were 
common objects whose names would probably be reasonably familiar to subnormal children. The Mein & 
O'Connor (1960) word lists were consulted for this purpose. Objects were selected for drawing if they 
conformed to the following conditions: (a) their names were one syllable in length; (b) they could be 
represented as having depth, i.e. they were not flat objects. The drawings were two sets of 28 common 
objects, such as car, cup, brush, tree. Each set of 28 objects was drawn in two ways, one set with depth 
cues representing the objects in three dimensions, the other set without depth cues, giving only 

a two-dimensional outline with the addition of important detail. The drawings were photographed and slides 
prepared for projection. 

A control panel with start and stop buttons was operated by the experimenter. Pressing the ‘start’ button 
resulted in a spotlight projector being switched on, throwing a low-intensity spot of light of 7-5 cm diameter 
on a screen. After a period of 3 sec the spotlight projector was switched off while simultaneously a Kodak 
Carousel projector and a Watesta counter-timer were switched on, the projector throwing a picture 
approximately 50 cmx30 cm on the screen. The noise of operation of the projector was greatly reduced by 
placing it in a wooden box lined throughout with expanded polystyrene of 2:5 cm thickness, with a hole cut 
in it large enough for the projector lens. A Kodak neutral density filter (ND 1-0) was placed in front of the 
slide projector lens. This cut the intensity of the projected light by 90 per cent. The purpose of this was to 
reduce to a minimum the possibility of dazzle and of the formation of visual after-effects. Pressing the ‘stop’ 


button resulted in the slide projector switching off and advancing in preparation for presentation of the next os , 
slide. The counter-timer also stopped, thus enabling the time to be read by the experimenter. The cycle was» Ne oe 
repeated on pressing the ‘start’ button again, which also reset the counter-timer to zero. The control of tHe [io 
fy eS 
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projectors and the counter-timer was achieved by the use of solid state modular programming equipment 
produced by Behavioural Research and Development Limited. : 


Procedure 


The apparatus was set up in a quiet room in each school. The curtains were drawn to exclude variations in 
the intensity of daylight entering the room and a 150 W bulb illuminated the room. Each child was tested 
individually and was seated 2 m from the projection screen. The projectors were placed on the other side of 
a dividing screen from the child, who was thus unable to see them. The child was also unable to see the 
front of the instrument cabinet in which the timer was mounted. The room was cleared of furniture and 
potentially distracting pictures and other objects were removed. The experimenter sat near to the child, 
within reach for purposes of handling and control, but slightly behind so as to be unseen while the child was 
looking at the projection screen. The experimenter had a small table on which were placed the control panel 
and paper for noting the response latencies. 

Before the start of the experiment, each child was given the following instructions: ‘I am going to show 
you a lot of pictures. I want you to tell me what the picture is as fast as you can’. The instructions were 
repeated or rephrased until it was clear that the child understood them. When the experimenter judged that 
there was sufficient rapport, a trial run of pictures was shown to the child. These pictures were of different 
objects from those in the experimental set. Before the pictures were presented, the experimenter told the 
child that before each picture he would see a spot of light which would tell him that the picture was coming. 
He was to look at the spot of light and then say what the picture was as fast as he could. These instructions 
were repeated and rephrased as necessary. Before each presentation the experimenter said, ‘Look at the 
screen. Get ready’. When the child was fixating the screen, the experimenter pressed the ‘start’ button. As 
soon as the child had named the picture, the experimenter pressed the ‘stop’ button and noted the time 
shown on the timer. During the trial run, the experimenter used as much explanation and verbal approval as 
was necessary to get the child to name each picture. A minimum of ten trial pictures were shown, but far 
more than this were required in some cases. Two ESN(S) children who had originally been selected for the 
experiment had to be discarded through failure to follow the instructions. 

When the experimenter judged that the child was responding reliably in the test situation, and before the 
first trial, all children were shown the experimental pictures printed on photographic paper. They were asked 
what the picture was, and, if no response or an incorrect response was made, they were told its name. This 
was to ensure that the names were available to the children and that the child was capable of responding to 
each picture. Sufficient practice was given to establish consistent naming. After a short rest period, the first 
trial was begun. Before each picture was presented, the experimenter said: ‘Ready’ or ‘Look’, and only 
pressed the ‘start’ button when sure that the child was fixating the screen. The experimenter gave regular 
verbal approval of the children’s responses in order to maintain their motivation. 

Five trials were administered to each child, with all 56 pictures being shown during each trial. The pictures 
were presented in a different random order for each trial, in order to control for any possible serial learning 
effect. Not more than one trial was given to each child during any one morning or afternoon. 

It will be evident that part of the variance of the response latencies obtained in the experiment will be the 
experimenter’s own variance in responding to the children’s utterances. In experiments with more able 
children, it would clearly be desirable to use a voice-operated switch to stop the timer, thus eliminating the 
reaction time of the experimenter. In the case of retarded children, however, their tendency to make 
irrelevant sounds necessitated the use of the experimenter as a ‘filter’. If the variance of the experimenter’s 
latencies were extremely high, however, it could swamp the variance due to experimental effects. In an 
effort to keep this down to a minimum, therefore, the experimenter had extensive practice in administering 
the experimental procedure before embarking on the experiment. The children in the experimental groups 
were only seen when the experimenter felt that the administrative procedure was as smooth as possible. 

In addition, the experimenter’s response latencies to the 28 words were examined for word-frequency 
effects. Ten latencies were recorded for each word. A one-way analysis of variance indicated that the 
variance between words did not significantly exceed variance within words (F= 1-2, d.f. = 27, 252, 

P> 0-05). A Spearman rank-order correlation of 0-06 between rank order of word frequency and rank order 
of mean response latencies was also non-significant. These findings support the view that the experimenter’s 
own response latencies contributed only random variance to the present results. 

It should be noted that one purpose of obtaining the response latencies of the children to the 56 pictures 
was to select stimulus words and pictures for a later experiment on recall, not reported here. The 
distributions of response latencies to all 56 pictures were initially analysed to examine differences in 
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variance between the groups. At a later stage in the analysis, this number was reduced to 36, in a manner 
described later, to provide a balanced design for the analysis of word-frequency effects. 


Results 
Distribution of response latencies 


The total number of response latencies from the 42 children in naming 56 pictures over five trials 
was 11760. In the analysis which is reported in this paper, those response latencies which were 
of greater than 10 sec duration were excluded. Very long naming latencies are possibly the result 
of gross fluctuations in attention and are likely to have a disproportional influence upon the 
analysis. With a total of 5880 responses being made by the children in each group, 128 ESN(S) 
and 39 ESN(M) responses had a latency of 10 sec or greater, and were consequently deleted 
from the later analyses of 5752 ESN(S) and 5841 ESN(M) response latencies. 

The mean latencies of the ESN(M) and ESN(S) groups were 1:718 and 2-208 seconds 
respectively, their standard deviations being 0-220 and 0-364 seconds. Both distributions had the 
typical positive skewness found in response-time data, i.e. the mode of the frequency 
distribution was at a lower point on the time scale than the mean. 

The frequency distribution of the naming latencies is likely to be the composite of two factors: 
within-child and between-child variance’ The total variance of the response latencies was 
therefore partitioned in order to examine differences between the ESN(M) and ESN(S) groups in 
terms of these sources of variance. 

A comparison of the between-child variance of the two groups indicates greater variance in 
the ESN(S) group (F = 6-59, d.f. = 20, 20, P< 0-01). This is also the case when the within-child 
variance of the groups is compared (F= 2-77, d.f. = 5731, 5820, P< 0-01). The results support 
the view that the ESN(S) children are characterized by greater inter-individual variability as well 
as greater intra-individual variability in naming latencies when compared with the ESN(M) group. 


Analyses of the effects of the experimental variables 


The Thorndike~Lorge (1944) frequency of the picture names was obtained. Twelve of the words 
in the list occurred at a frequency of 100 or more words per million (Group A), seven at a 
frequency of between 50 and 100 words per million (Group B), and nine at a frequency of less 
than 50 words per million (Group C). Nine words were assigned to each of two categories — high 
frequency, consisting of nine randomly selected group A and B words; and low frequency, 
consisting of all the group C words. 

The non-normality of the distribution of response latencies has already been noted. In order to 
improve normality, mean scores were obtained for each child on each trial for each of the two 
sets of pictures (with or without depth cues), and for each of the two word-frequency categories. 
Twenty mean scores were thus obtained for each child, a given score being the mean of that 
child’s response latencies in naming the nine pictures of a given word-frequency category and of 
a given set of drawings during a given trial. The mean of time data such as that gathered in this 
experiment is positively related to its variance. For a given child under a given experimental 
condition, the variance (and its associated mean) in picture-naming latencies will be the sum of 
two components: variance due to the condition itself and within-child ‘error’ variance. Mean 
scores for each child in each cell of the design, rather than medians or modes, were calculated 
on the assumption that, being most closely related to the variance, they would best reflect the 
effects of the experimental conditions, if these were present. 

The 840 mean response latencies obtained using this procedure were still significantly 
non-normal and were consequently normalized with a logarithmic transformation. A test of 
skewness (g,) and the Kolmogorov—Smirnov test of goodness off it to a normal distribution 
(Sokal & Rohlf, 1969) were both significant (g, = 0-436, s.e. = 0-085, P< 0-01; Dmax = 0-063, 
P<0-01). The cell mean response latencies were logarithmically transformed, as this type of 
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transformation is likely to make right-skewed response time data more normal (Winer, 1962; 
Guilford, 1965; Sokal & Rohlf, 1969). The transformation was effective in reducing skewness 
(g, = 0-143, P> 0-05) and in making the distribution more normal (Dmax = 0-039, P> 0-1). 

A four-way analysis of variance was conducted on the 840 log mean response latencies. The 
factors were as follows: factor A: group membership (ESN(M) vs. ESN(S)); factor B: presence 
of depth cues (with vs. without); factor C: trials (five); factor D: word frequency (high vs. low). 

As the second, third and fourth factors involved repeated measurements on the same children, 
there were 21 subjects per cell in the design. The design was adapted from one outlined by 
Winer (1962, p. 319). It is assumed that factors A and C are fixed factors and that factors B and 
D are random. The constraints imposed on the selection of words and their corresponding 
pictures for this experiment certainly make the stimuli non-random. However, as Clark (1973) 
has persuasively argued, words should be treated as a random effect ‘whenever the language 
stimuli used do not deplete the population from which they were drawn. Note that the answer is 
not whenever the language stimuli used were chosen at random from this population’ (p. 348). 
By treating words as a fixed effect, results obtained cannot be generalized to the language 
population, but only to the stimuli employed in the experiment. Clearly, the same argument 
applies to the population of object drawings. If the results of this experiment are to be 
considered as generalizable to more inclusive populations of words and pictures, the words and 
pictures must be considered to be random effects, particularly as it is possible to think of other 
words and pictures which could have been used instead. 

The summary of the analysis of variance is given in Table 1. Barlett’s test of homogeneity of 
variance was used to test whether or not the interactions with children could be dropped from 
the model and pooled. The test indicated that they could indeed be pooled (x? = 8-45, d.f. = 6, 
P> 0-1). The pooled interaction term ~- labelled ‘Initial pooled error (within)’ in Table 1 — was 
used for conducting initial tests on the second- and third-order interaction terms in the analysis. 

Due to the fact that there were two random factors in the model, the pooled error term was 
not always the appropriate denominator for the F ratios. In such cases, quasi-F ratios were 
constructed (Winer, 1962, p. 199) with associated degrees of freedom suggested by Satterthwaite 
(1946). These initial tests were considered to be appropriate since interaction terms between 
fixed and random factors may be denominators for F ratios of lower order interactions and main 
effects. If such interactions have very small degrees of freedom, the resulting F ratios will be of 
very low power. If preliminary tests on the higher order interactions in the model indicate that 
their effects are non-significant, they can be dropped from the model and their variance pooled 
with the error term. A more extended rationale for this procedure is given by Winer (1962). The 
preliminary tests conducted on the higher order interactions indicated that they were all 
non-significant and consequently they were pooled with the error term obtained from the 
interactions with subjects. A second stage of preliminary tests was then conducted on the 
first-order interactions. Three of these (AB, BC and BD) were clearly non-significant and were 
also dropped from the final model, their effects being pooled with the error variance to produce 
the final pooled error term shown in Table 1. This term was used as the denominator for the 
main effects of factors B (pictures) and D (words) and for the three remaining first-order 
interactions. The mean square of factor C (trials) was tested using the interaction term CD as the 
denominator. The test of significance of factor A (group membership) was more complex, and 
required a quasi-F ratio, of the following form: 


F’ = MS ay (MS + MScap)—2M Seerron))- 


Degrees of freedom for the denominator were estimated to be 6. 

Examination of Table 1 indicates that the main effects of group membership and word 
frequency were both significant, but both are qualified by significant first-order interactions. 
Means for all significant effects are shown in Table 2. 
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Table 1. Summary of analysis of variance 








F ratio or 
Source Mean square d.f. quası-F (+) P 
Between children 1-547 41 - - 
A (Group membership) 15-333 1 7-85* < 0-05 
Children within groups 1-203 40 — — 
Within children 0-051 798 — _ 
B (presence of depth cues) 0-019 1 <1 _ 
*AB 0-029 1 <l — 
*Bxchildren within groups 0-027 40 — — 
C (trials) 0-457 4 4-86 — 
AC 0-565 4 14-13 <0-01 
*Cxchildren within groups 0-043 160 — : — 
D (word frequency) 3-634 1 90-85 <0-01 
AD 0-790 1 19-75 <0-01 
*Dxchildren within groups 0-060 40 — — 
*BC 0-011 4 <1 — 
*ABC 0-044 4 <it — 
*BCxchildren within groups 0-040 160 — — 
*BD 0-027 1 <1 ~ 
*ABD 0-001 1 <1 — 
*BDxchildren within groups 0-053 40 — — 
CD 0-094 4 2-35 — 
*ACD 0-026 4 < It — 
*CDxchildren within groups 0-034 160 — — 
*BCD 0-069 4 1-73 — 
*ABCD 0-060 4 1-50 — 
*BCDxchildren within groups 0-039 160 — — 
Initial pooled error (within) 0-040 760 _ — 
Final pooled error (within) 0-040 783 -e — 


* Effects pooled with error term to produce final pooled error. 


Table 2. Means for all significant effects (mean log response latencies) 
Trials 
1 2 3 4 5 Total 
ESN(M) group 0-544 0-442 0-392 0-297 0-278 0-390 
ESN(S) group 0-647 0-679 0-643 0-665 0-669 0-661 
Word frequency 
High Low 


ESN() group 0-355 0-426 
ESN(S) group 0-564 0-757 


Total 0-460 0-591 
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The analysis indicates that ESN(M) children have significantly shorter response latencies in 
naming pictures than ESN(S) children. Furthermore, ESN(M) children show a marked reduction 
in response latencies over trials but the ESN(S) children do not. The levelling out of this trend 
for the ESN(M) children between trials 4 and 5 may be attributable to their reaching an optimal 
level of efficiency in responding by about the fourth trial. However, one cannot say on the basis 
of the evidence whether the interaction between trials and groups is due to learning or to 
motivational or attentional effects. 

The present study indicates that depth cues in pictures have no significant effect upon 
latencies. Although the main effect of word frequency was significant, high frequency words 
being named more rapidly than low frequency words, the effect is more pronounced in the 
ESN(S) than in the ESN(M) group, as Table 2 indicates. 


Discussion 


The results lend support to previous findings cited earlier regarding the relative slowness and 
heightened variability of response times in subnormal individuals and extends these findings to a 
new type of task. The more retarded ESN(S) group showed greater variability both within 
children and between children than did the ESN(M) group. 

Of the two major stimulus factors, the results indicate that word frequency is far more 
important than presence of depth cues in its effect on naming latency, a finding very much in line 
with that Wingfield (1968), cited earlier. The finding that the word-frequency is shown more 
markedly in the more retarded ESN(S) group than in ESN(M) children is at first sight perhaps 
more difficult to account for. ESN(S) children, as Mein & O’Connor (1960) have shown, have 
smaller vocabularies than ESN(M) children and it seems likely that their vocabularies consist 
largely of relatively high frequency words. The words selected for this experiment, even those 
with a Thorndike—-Lorge frequency of less than 50 words per million, were extremely common. 
Although the division of words into those of greater or lesser frequency than 50 words per 
million may have been sufficient to produce marked effects on the response latency for the 
ESN(S) children, the words in the relatively low-frequency category may have been too common 
to produce such a marked effect in the ESN(M) group. In other words, the findings might be 
seen as suggesting that a type of ceiling effect is operating, thus preventing the ESN(M) children 
from showing such a large difference in naming latency between high and low frequency words. 
Against such an interpretation, however, it must be emphasized that the ESN(M) and ESN(S) 
groups were initially matched on a measure of picture-naming vocabulary. There is, therefore, 
evidence against the view that the low-frequency names were, in a sense, too easy for the 
ESN(M) children. 

This leaves us with an explanation of the results in terms of a theory of semantic retrieval. 
The Oldfield model, further developed by Lachman, suggests that object names are stored in 
lexical memory according to some system related to their frequency, and that presumably a 
lexical search process, possibly involving binary decisions, is required to retrieve a particular 
name. As the frequency of names reduces, so a greater number of binary decisions would be 
required in order to retrieve them. As outlined earlier in this paper, one would predict from the 
model, taken together with evidence on the relative slowness of severely retarded individuals, 
that naming latency would show a greater rate of increase in ESN(S) children for a given 
reduction in word frequency. This is, of course, exactly what was found in this experiment. 

Further evidence comparing the naming latencies of various groups of individuals over a wider 
frequency range of object names would be necessary in order to support or to disconfirm such a 
theory of semantic retrieval. The picture will probably be found to be more complex than the 
one outlined above. For example, Lachman et al. (1974) have persuasively argued that lexical 
uncertainty and age of acquisition may be of equal or greater importance than word frequency, 
but that all three variables are closely related. This may also be found to hold for subnormal 
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persons as well as for the undergraduate samples with which Lachman et al. were working. 
Furthermore, as they suggest, the decision processes involved in naming pictures may require 
not only a binary search of a lexical store but also the use of other memory components such as 
semantic stores for concepts or acquired knowledge. 

The results of the present experiment suggest, however, that the length of latency in naming 
pictures in subnormal children is related both to their degree of retardation and to the frequency 
of occurrence of the object names in the language. On a practical level, the findings suggest that 
stimuli for experiments involving relatively brief pictorial exposures may require careful 


selection. 
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A comparison of the implication and repertory grid techniques 


Terry Honess 


The ‘repertory grid’ and the ‘implication grid’ techniques are compared using criteria based on an indirect 
measure of construct matching scores. The implication grid proved superior to the repertory grid under 
conditions designed to compare stability of matching scores and under conditions designed to compare 
sensitivity to subjects’ changes in construing. The significance of a further result, that the implication grid 
was superior for reflecting construct bipolarity, is considered in the light of conceptual problems with an 
assumption of bipolarity. The set of results is especially noteworthy since ‘indirect’ measures were formerly 
the exclusive province of the repertory grid. The discussion raises general issues about the relative utility of 
two grid formats, and the unique measurement possibilities of the implication grid are illustrated. It is 
concluded that the implication grid deserves careful consideration for use in studies concerned only with 
construct relationships. 


Kelly (1955) developed his ‘repertory grid’ method in the context of his work as a psychotherapist. 
The repertory grid (RG) provided a systematic framework for a client to describe his ways of 
discriminating (the constructs in a grid) between people important to him (the elements in a grid). 
Kelly assumed that each construct applied to a limited number of people and that each was 
bipolar. The most important RG measure was the degree of association between construct pairs. 
For example, the construct ‘aggressive—quiet’ would have a high degree of association (high 
matching score) with the construct ‘tense-relaxed’ if the people construed as ‘aggressive’ were 
usually construed as ‘tense’ and the people construed as ‘quiet’ were usually construed as 
‘relaxed’. The RG has now been applied to a variety of problems, some far removed from 
psychotherapy, but the emphasis on computing construct-matching scores remains. 

Working within the framework of Kelly’s ‘Personal Construct Theory’, Hinkle (1965) 
developed the ‘implication grid’ (IG) for examining the relations between constructs. Subjects 
answered questions of the form, ‘If you changed from being aggressive to quiet, what other 
constructs would have cause to be changed (e.g. tense to relaxed, ambitious to contented) by a 
change in yourself on this one construct alone’. This process was repeated so that each 
construct was paired twice with each of the others in order to generate a square matrix of 
interconstruct implications. Hinkle’s IG has unique measurement advantages over the RG 
(Bannister & Mair, 1968, ch. 3), but Ryle (1974, p. 120) notes that it is a complex task for 
subjects, and is one that has attracted little research. 

Fransella (1972) goes some way to meeting these criticisms by developing a modified form of 
the IG in the context of her construct theory approach to the treatment of stutterers. Fransella 
employed a card-sorting task in which each construct pole was considered separately. Subjects 
were presented with one card at a time (e.g. ‘aggressive’) and told that all they know about a 
person was that he was ‘aggressive’. They were then asked which of the other qualities on the 
cards before them would they expect to find in a person who was ‘aggressive’. This procedure 
was repeated for all cards so that a matrix of implications between different construct poles could 
be generated. 

It is customary to distinguish between the RG and the IG on the grounds that the former is an 
indirect, and the latter a direct assessment technique. For example, Bannister & Mair (1968, 

p. 94) argue ‘It seems likely that neither approach supplants the other but rather supplements it. 
Kelly’s method (the RG) may uncover possible construct links of which the subject himself is 
unaware, while in the impgrid (IG) situation, only relationships construed by the subject can 
readily appear’. However, this distinction is not tenable for all IG measures. 

In the experiment reported here, the IG is used to provide an indirect measure of the degr 
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of probability of association between each pair of constructs. It must be emphasized that this is 
a significant departure from normal practice, although Fransella (1972, Appendix I) does indicate 
that such a measure is possible using the IG. Indirect assessments of construct relationship 
scores have hitherto been the exclusive province of the RG. It is the aim of this study to 
compare the IG and RG on criteria based on construct-matching scores. Test—retest reliability, 
sensitivity to subjects’ changes in construing and adequacy in reflecting bipolarity of constructs 
are all examined. 


Reliability 

Personal construct theory is primarily concerned with evolution and change in construct 
organization. This has led Kelly to define reliability in terms of test characteristics that are 
‘insensitive’ to change. The same emphasis, coupled with the flexibility of the RG, has also led 
Bannister & Mair (1968) to employ a pejorative tone when speaking of ‘statistical platitudes’ in 
the context of discussion concerning reliability. Nevertheless, there are many reports of 
significant test-retest RG scores using adults as subjects (Bannister & Mair, 1968, provide a 
review). More recently, Lansdown (1975) provides good evidence that children (aged 9~11) also 
manifest significant retest construct-matching scores based on the RG. 

Mair (1964) offers clarification of the reliability issue by arguing that we must define the 
conditions under which stability or instability in grid scores may be anticipated. The constructs 
selected for the study reported here, were commonly used descriptions and their subject 
supplied opposites. Over a short period of time, little change in matching scores between this 
type of construct would be expected, therefore supenority can be claimed for the grid technique 
which yields the higher retest coefficient. 


Sensitivity to subjects’ changes in construing 


The possibility that a low retest coefficient produced by an individual may be due to his 
‘revising’ his construct system cannot be completely ruled out. A relevant empirical finding here 
is that ‘the source of a low reliability coefficient lies largely in radical matrix changes from one 
or two, out of a large number of constructs’ (Bannister & Mair, 1968). This is consonant with 
construct theory from which it would be predicted that change is limited to specific construct 
relationships that have been invalidated. It is recognized that it is impossible to completely 
separate score changes that are due to a subject revising his construct system from other score 
changes due to what we shall call ‘error variance’, but if a subject does revise his construct 
system over a short time period, this revision will be largely indexed by a change in just one 
construct’s matching scores. In this study, the more sensitive grid format was therefore defined 
as the one that yields the greater improvement in retest coefficients following the elimination of 
the least stable construct. 


Bipolarity 

The difficulties of translating the important theoretical assumption of bipolarity of constructs into 
practical RG measurement have been illustrated by Mair (1967) and Epting, Suchman & Nickeson 
(1971). An examination of the relative adequacy of the IG in reflecting construct bipolarity is 
therefore of particular interest. 


Structure scores 


A variety of overall structure scores based on construct-matching scores have been developed 
using the RG, but only one score, ‘intensity’, is computed for both grids in this study. Intensity, 
an estimate of overall matching strength, is used frequently, particularly in studies concerned 
with the process of schizophrenic thought disorder. Bannister (1962) reports a low retest 
coefficient and a tendency for intensity scores to increase following an immediate RG retest. 
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Intensity retest coefficients and score change trends are compared for both grids in this study. 
The total number of implications for each individual’s IG was also computed. This measure has 
been employed to index aspects of cognitive organization (Fransella, 1972; Crackett & Meisel, 
1974) but no studies of its reliability have appeared in the literature. 


Equivalence of grid forms 


Mair & Boyd (1967) demonstrated that the ‘rank-order’ and ‘split-half’ forms of the RG need 
not provide equivalent estimates of construct relationships. However, we must predict significant 
shared variance between the IG and RG matching scores since, in this instance, they purport to 
measure the same aspect of construct organization, viz. an indirect measure of the degree of 
probability of association between each pair of constructs. 


Method 
Subjects 


Two matched class groups from a state comprehensive school. The mean overall age for class group 1 (13 
boys and 10 girls) and class group 2 (15 boys and 13 girls) was 12-8 years (s.D. = 0-3). 


Descriptions used in the grids. 50 children matched in all respects with the groups described above 
completed two essays about liked and disliked same sex peers. Twelve descriptions which frequently 
occurred in these essays were supplied to each child in this study who was asked to specify what he (she) 
would use as the opposite of each word in describing his (her) same sex peers. This is the ‘opposite method’ 
for eliciting bipolar constructs (see Epting et al. 1971). The supplied descriptions were big head, kind, quiet, 
babyish, friendly, show off, dumb, bit of a snob, sensible, horrible and very nice. Eight descriptions were 
used in each grid: four supplied descriptions and their elicited opposites were chosen for every child. 


The grids 

Both grids consisted of ten-page booklets. Instructions were printed on page 1. Page 2 concerned questions 
about the descriptions ‘nice’ or ‘good’ which were used to familiarize subjects with the procedure. The 
remaining eight pages concerned the eight descriptions which constituted the four bipolar constructs under 
examunation. 


RG. A group form of an 8x8 ‘rank order grid’ was used The elements were eight photographs, each lettered 
A to H, of unknown same sex peers. Each subject was required to rank order the photographs using each 
description, e.g. ‘The boy in photo B is the most friendly, the boy in photo G the next most friendly’, and 
so on. Typically, enlarged photos are placed at the front of the group, but in this study separate sets of 
photos were supplied to each child to enable him to work quietly and entirely at his own pace. The same 
photos were used for both administrations which will inflate reliability coefficients but, as Slater (1969) has 
noted, to monitor change both constructs and elements must remain the same on repeat sessions. 

This particular RG format was selected because Ravenette (1964) notes that rank ordering was the ‘natural 
approach’ adopted by the children in his study. Furthermore, most of the studies with children have 
employed this type and size of grid (e.g. Wooster, 1970; Lansdown, 1975; also see Ravenette, 1975). Finally, 
it may be noted that photos of unknown boys and girls are ideal RG ‘elements’ to match the IG format 
which questions children’s perceptions of an ‘unknown’ new boy/girl. 


IG. The instructions stressed that new boys (girls for female subjects) were to be considered, about whom 
only one thing was known. At the top of each remaining page was wntten a statement such as ‘The new boy 
is helpful’. This was followed by seven questions (concerning the remaining grid descriptions) of the form, 
‘If the new boy is helpful will he be dumb?’ Subjects were required to tick the appropriate answer box of 
‘very likely’, ‘May or may not’ or ‘very unlikely’. The booklets were summarized by the expemmenter into a 
grid as shown in the example in Fig. 1. Descriptions 1 & 5, 2 & 6,3 & 7, 4 & 8, were the subject’s four 


‘bipolar constructs’. The implications in the leading diagonal were assumed to be ‘very likely’. For this => '- 

child, if a new boy 1s a ‘show off’ he is very likely to be ‘brainy’, he is very unlikely to be ‘quiet’ or Ree Aus 

‘dumb’ or to ‘stay out of the way’, and he may or may not be ‘helpful’ or a ‘trouble maker’ or ‘only ly =~ 

himself’. 4 af es 
W - 
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CONSTRUCTS 


Quiet 

Show off 

Helpful 

Dumb 

Trouble maker 
Stays out of the way 
Only helps himself 


Brainy 





Figure 1. An implication grid. Columns indicate what a description implies and rows indicate the descriptions 
by which a given construct is implied; ‘very likely’ implications are indexed by ticks, ‘very unlikely’ by 
crosses, and ‘may or may not’ by blank cells. 


Procedure 


Class group | completed the RG first, followed by the IG one week later. Then the RG again four weeks 
after that. The order for class group 2 was IG, RG, IG with the same time intervals. 


Scoring 


RG. The procedures for computing matching scores between descriptions and intensity (overall matching 
strength) are described by Bannister & Marr (1968). Bipolarity between any two descriptions was indexed by 
r<—0-83 (P< 0-01) and positive association was indexed by r>+0-83. 


IG. ‘Very likely’ was scored 2, ‘may or may not’ 1, and ‘very unlikely’, was scored 0. The match between 
each pair of descriptions was computed by summing the difference scores from both rows and columns in 
the sdme fashion as the matches between rows are scored in a rating form of the RG (Bannister & Mair, 
1968, p. 64). For example, the overall difference score (D) between ‘show off’ and ‘trouble maker’ in Fig. 1 
is 9 only, which reflects a high match between these two descriptions. Now ‘may or may not’ — ‘may or may 
not’ cell matches spuriously contribute to a low D score, therefore each D score was multipled by 16/16~M 
(16 is the total number of row and column comparisons for a pair of descriptions in an 8x8 grid and M is the 
number of ‘may or may not’- may or may not’ cell matches), Intensity was also computed in the same 
manner as for the RG: all D score deviations from 16 (the average D score in the possible range 0-32) were 
summed ignoring sign. In addition, the total number of implications in each individual’s grid was computed. 

Bipolarity between any two descriptions was indexed by D > 23 (P< 0-01) and positive association was 
indexed by D <9. The mathematical procedure for arriving at these D scores 1s as follows. The possible 
difference scores between each pair of cells are 0, 1 and 2. On a chance basis, the likelihood of a 0 score is 
1/4, the likelihood of a 1 score is 1/2, and the likelihood of a 2 score is 1/4 (remember there is a correction 
for ‘might’ - ‘might’ matches). Hence, the likelihood of an overall D score of 0 = (1/4). (161/16!), the 
likelihood of a D score of 1 = (1/2), (1/45). (16!/15!1), and the likelihood a D score < 1 = the sum of these 
probabilities. The sum of the likelihood of D scores <9 gives a P value of 0-00996 or 0-0! (like the RG 
criteria, these P values are for a one-tailed test). It 1s recognized that the criterion level for bipolarity is 
based on an approximation of the distribution for the corrected D. Nevertheless, it reveals the same pattern 
of results as the ‘index of relative bipolarity’ (see below). 


IG & RG. For each child the matching scores derived from both types of grid were rated from 1 (highest 
match) to 28 (lowest match). Two sets of scores were based on this data: (i) an mdex of relative bipolarity. 
The mean rank of the matches between the poles of the four ‘bipolar’ constructs was computed for each 
child. A mean high numbered rank would indicate that the four are relatively bipolar compared to other pairs 
of descriptions within the child’s system. (ii) An index of relative stability of descriptions based on retest 
data. The seven ranks based on the matching scores for each description on test session 1 were subtracted 
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from the corresponding ranks on session 2. The sum of these differences, disregarding sign, is a measure of 
stability relative to other difference scores derived from the same subject’s grid. The greater this sum, the 
more ‘unstable’ is the description. 


Results 
Reliability of matching scores 


Each subject’s eight grid descriptions yield 28 matching scores. Pearson’s r was computed for 
each subject’s matching scores across test-retest sessions. The average retest r for the IG was 
0-82 (n = 24) and for the RG was 0-66 (n= 17). The difference between the two sets of 
correlations is statistically significant: Mann-Whitney U = 301, z= 2-57, P=0-01, two-tailed test. 


Sensitivity to subjects’ changes in construing 


The least stable description was eliminated from all grids used for the reliability assessment, 
leaving 21 matching scores for each subject’s grid. The new average retest coefficients were 0-89 
for the IG and 0-75 for the RG. The superiority of the IG in eliciting reliable matching scores 
was increased (U = 310-5, z= 2-818, P=0-005, two-tailed test). 

It is relatively difficult to ‘improve’ on an already high correlation coefficient so only those 
children whose full grids yielded reliability coefficients of 0-80 or less were included in the 
following comparison. After elimination of the least stable descriptions from all relevant grids, 
the average improvement in 7’ scores (all rs were transformed to z’ scores) for the IG group was 
0-422 (n= 9) and for the RG group 0-183 (n= 12). The IG showed significantly greater 
improvement in the reliability of matching scores (U=24, P< 0-05, two-tailed test) which 
reflects the IG’s greater sensitivity. 


Bipolarity of constructs 


(a) If the grids adequately reflect construct bipolarity, a high proportion of descriptions and 
their elicited opposites (the ‘bipolar constructs’) should show a significant bipolar relationship. 
Of the 184 (46 subjects x4) yielded by the RG, 53 (28-8 per cent) met the criterion for bipolarity 
and 2 (1-1 per cent) the criterion for agreement. Of the 188 (47 subjects x4) scores yielded by the 
IG 107 (56-9 per cent) were significantly bipolar and 4 (2-1 per cent) significantly in agreement. 
The difference betwen the proportion of bipolar constructs yielded by the two different 
techniques is highly significant (z= 5-51, P< 0-001). 

(b) Both grids yield a mean rank of the matches between the poles of the four constructs. 
These were compared for each subject using the Wilcoxon matched-pairs test. The ranks from 
the IG were significantly higher, i.e. reflected greater bipolarity (T= 121, z=2-85, P< 0-005, 
two-tailed test, n = 34). 

(c) The raw data of each IG allows a screen for bipolarity. For a bipolar construct X-Y, the 
questions ‘If X then Y?’ and ‘If Y then X?’ should both be answered ‘very unlikely’. Of the 188 
constructs under examination, 118 fulfilled this condition. Of this 118, 100 (84-7 per cent) met the 
criterion for significant bipolarity. The remaining 18 were also relatively bipolar within the 
relevant individuals’ matrices (mean rank = 19-28). It is noteworthy that for the four constructs 
which showed significant agreement between poles, the answers to the questions, ‘If X then Y?’ 
and ‘If Y then X?’ were all ‘very likely’. 


Overall structure scores 


The reliability coefficient for RG intensity scores was low (r= 0-35, n= 17, n.s.) in contrast to 
the significant coefficient for intensity yielded by the IG (r=0-62, n= 24, P< 0-01). The retest 
coefficient for the ‘total implication’ score, derived from the IG, was also high (r = 0-79). Both 
IG and RG intensity scores were greater on retest, but in neither case was the trend statistically 
significant. 
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Equivalence of grid forms 

The mean correlation between the 28 matching scores derived from the two different techniques 
for the same group of children was 0-50 (n= 41). For 35 children, the test-retest period was one 
week, for the remainder it was four weeks. 


Discussion 

Reliability and sensitivity 

The matching scores based on both the RG and IG show significant consistency over a five week 
period. The marked superiority of the IG reliability scores may be attributed to one of the 
following. First, it might be argued that the IG is less susceptible to random fluctuations, or 
second, ıt might be argued that the IG is less sensitive to subjects’ revisions of their construct 
systems. A third possibility, that the grids are indexing different aspects of construct systems, is 
discussed under ‘direct vs. indirect assessment’. 

The first explanation is plausible insofar as the IG scores are based on more information: both 
‘row’ and ‘column’ matches; whereas RG scores are based on only part of the grid data since 
‘column’ (element) matches are ignored. Furthermore, the second explanation fails to find 
support from the statistical analysis of grid sensitivity. Indeed, there is good evidence that the 
IG is more sensitive to subjects’ revisions of their construct systems. However, future research 
should try to arrange for or detect invalidation of particular constructs in order to compare their 
stability with other constructs (e.g. Bannister’s, 1963, 1965, serial invalidation experiments with 
the RG). Nevertheless, within the limits of the experiment reported here, the argument that the 
IG yields more reliable matching scores and is more sensitive to system revisions than the RG 
takes most credence, 

Intensity scores are based on matching scores, so it is to be expected that the IG intensity 
scores are more reliable than those based on the RG. This finding may have ramifications for the 
RG’s use in clinical work. The superior reliability of the IG intensity scores suggests that the IG 
may offer greater discrimination between thought-disordered schizophrenics and other clinical 
groups (see Bannister, 1962; Bannister & Fransella, 1966). Indeed, in the context of 
schizophrenic thought disorder, the IG appears to be uniquely useful. It provides a measure of 
‘internal inconsistency’ (see below) and does not include elements so there are no ‘complex 
visual stimuli’ (see Frith & Lillie, 1972, and Haynes & Phillips, 1973, and their criticisms of 
Bannister’s use of the RG). 


Bipolarity 

In the analysis of construct bipolarity, it was assumed that subjects should use the opposite pole 
of their constructs as mirror images of each other. This criterion for bipolarity has been used in 
previous studies (e.g. Epting et al. 1971) and it is one that is assumed in some RG analyses (e.g. 
Gibson, 1975). Nevertheless, there are conceptual problems with an assumption of construct 
bipolarity and these must be briefly considered. 

In a discussion of this issue, Mancuso (1970, p. 299) concludes that ‘many terms have a 
nebulous “nonentity”’ as their other pole and that not every construct is of an A-B (single pole — 
single contrast pole) quality’. Support for this argument can be drawn from Ogden’s (1967) 
linguistic analysis of the complexities of ‘opposition’, the failure of Green & Godfried (1965) to 
find support for a bipolar model of semantic space, and Mair’s (1967) investigation using the RG. 
Mair observes ‘it seems possible, and even probable, that people, when describing others, do not 
use specific oppositionality as suggested by the use of single verbal labels, but may rather use 
amalgamations of ideas in some form of generalized contrast’. In short, it could be argued that 
the subjects in this experiment should not be expected to manifest strict bipolarity. 

If our bipolar comparison can be criticized on these grounds then this aspect of our results is 
certainly of less importance. However, the bipolar constructs were elicited using the ‘opposites’ 


. 
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method: ‘What do you regard as the opposite of this description’ (see ‘method’ section). The 
subjects’ responses to this question might therefore be classified as an external criterion against 
which the IG proved superior to the RG. A counter-argument to this proposal, that the RG is 
studying construct relationships of which the subjects may not be aware, is discussed in the next 
section. | 


Direct vs. indirect assessment 


In our introduction to the grid methods, a distinction was drawn between direct and indirect 
assessments of construct relationships. The IG is generally classified as a ‘direct’ method 
because it is based on questions of the form ‘If construct X, how likely is the presence of 
construct Y?’. In contrast, the RG attempts an ‘indirect’ assessment of construct interrelations 
by requiring the subject to portray the way in which different constructs are used across a 

range of different persons. In spite of this general distinction between the two techniques, the IG 
measure computed in this experiment is based on an indirect assessment and is therefore directly 
comparable to the RG measure. 

The IG measure is indirect because it is not based on a subject’ s estimate of the strength of 
the relationship between ‘X’ and ‘Y’ as in the IG employed by Crockett & Meisel (1974), or the 
methods used in studies of implicit personality theories (e.g. Hays, 1958; Bruner, Shapiro & 
Tagiuri, 1958). It is, instead, based on a comparison of the way in which ‘X’ and ‘Y’ are 
employed across a wide range of questions, in a similar fashion to the RG. This argument is 
made most explicit by the procedure for the computation of the IG measure, which is the same 
as that developed for the rating form of RG (see ‘method’ section). 

Finally, the argument that the grids index the same aspects of construct organization (as far as 
matching scores are concerned) receives support from the significant shared variance between 
the two sets of grid scores. This was not especially high, but it must be emphasized that it is at 
the same level as that reported by Mair & Boyd (1967) in their comparison of two forms of the 
RG. 


Contrasts 


Notwithstanding the grids’ equivalence in computing construct matching scores, the grids may be 
contrasted on several other measures. First, Bannister & Mair (1968, p. 96) note that the RG, in 
contrast to the IG, ‘cannot reveal contradictory implications because the mathematics of grid 
format and analysis cannot accept it — the subject is forced into either consistency or “no 
relationship ”’. Hinkle (1965) argues that his IG indicates ‘logical inconsistency’ when a line of 
implications has not been extended, i.e. when A implies B, B implies C, but A fails to imply C 
(p. 63). However, such a pattern may arise from differences in the strength of implications 
between different construct pairs, rather than any logical inconsistency between them. 
‘Inconsistency’, or what Bannister & Mair (1968, p. 96) have termed ‘contradiction’, is much 
easier to pinpoint if both ‘very likely’ and ‘very unlikely’ implications are indexed, as is the 
case in the grid format developed in this study. 

An illustration of ‘contradiction’ is construing may be drawn from the system presented in 
Fig. 1. Consider the three constructs ‘brainy’, ‘show off’ and ‘trouble maker’. The subject 
indicates that a boy who is ‘brainy’ is very unlikely to be either a ‘show off’ or a ‘trouble 
maker’; and a boy who is a ‘troublemaker’ is very likely to be | a ‘show off’ and very unlikely to 
be ‘brainy’. So far, there is no ‘contradiction’, the system is ‘logically consistent’, but the 
subject also argues that a boy who is a ‘show off > may or may not be a ‘trouble maker *, but is 
‘very likely’ to be ‘brainy’. This is inconsistent, a consistent response would be either a boy who 
is a ‘show off’ is very unlikely to be ‘brainy’ or may or may not be ‘brainy’. 

A detailed discussion of the significance of ‘contradiction’ is beyond the scope of this article, 
but several points may be noted. Bannister & Mair (1968) here suggested that ‘psychological 
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concepts such as “conflict” might be examined in terms of manifest implicative contradictions’. 
This is consistent with Kelly’s assumption that construing simultaneously involves an act of 
prediction (1955, pp. 119-125); contradiction in construing therefore entails an inability to predict 
and influence events. Furthermore, Mehrabian (1968, pp. 144-152) argues that this state of affairs 
is a precursor of change and development in cognitive organization. Hence, it may be speculated 
that the subject in our example is beginning to differentiate between the ‘showing off’ and 
‘trouble making’ aspects of others’ personality and behaviour. 

The grids may also be contrasted on the grounds that most RG studies have neglected to take 
account of the hierarchical nature of construct systems (discussed by Ryle, 1974, pp. 120-122). 
Furthermore, the few attempts to provide indirect measures of construct ‘superordinacy’ (see 
Kelly’s ‘organization corollary’, 1955, pp. 56-59), based on factor analysis of the RG have 
proved difficult to establish (Bannister & Mair, 1968, p. 206). Other problems with RG measures 
of hierarchical organization are discussed by Adams-Webber (1970) and Honess (1976). In 
contrast, the IG provides a direct measure of ‘superordinacy’, based on number of implications 
carried by each construct, which has received promising empirical validation from the few 
studies in which it has been employed (Hinkle, 1965; Fransella, 1972; Crockett & Meisel, 1974). 
This IG measure receives further empirical support from studies by the author (in preparation) 
involving university students’ aesthetic judgements (‘likely’ and ‘unlikely’ implications were 
indexed) and a developmental analysis of children’s perceptions of their peers. 

The principal advantage of the RG, over the IG as currently used, is that it encompasses an 
analysis of both constructs, and the elements construed. Indeed, many data are ignored if only 
construct relations are examined using the RG. Slater (1969) makes this point forcibly: ‘They 
(Bannister & Mair, 1968) disregard the fact that a (repertory) grid exhibits an interaction system 
and that the relations between the constructs are defined by their locations in an element-space. 
They do not examine the relations between the elements or of the elements with the constructs. 
Consequently they waste most of the information in their data, large parts of it entirely’. As 
noted earlier, there is no ‘waste of information’ in computing matching scores using the IG. 

The value of the RG analysis of the way in which ‘elements’ (usually people) are located in a 
subject’s construct system, has been amply demonstrated in a number of studies (e.g. Ryle, 1974). 
However, there is no reason why people known to the subject cannot be included in the IG. For 
example, ‘like I would like to be’ or ‘like mother’ might be used as ‘supplied constructs’. 
Alternatively, different persons may be used as the focus of the implications between constructs. 
The latter proposal has already been explored by Fransella (1972) in her analysis of constructs 
related to clients’ perceptions of self as a stutterer compared to clients’ perceptions of self as a 
non-stutterer. Nevertheless, it must be stressed that the value of the IG for indexing element 
relationships is largely speculative, but the argument does serve to underline the flexibility of the 
IG. 


Generality of results and conclusions 


The IG is flexible insofar as any person or other event may be used as the focus of implications 
between any type of construct. The RG is, if anything, more flexible because of the number of 
different formats that are available (see Bannister & Mair, 1968, ch. 2, for some of these). Thus, 
it may be that our results will be replicated for some versions of these methods, for some 
populations, but not necessarily for all. 

In spite of these qualifications, available evidence does not suggest that alternative forms of 
the RG are likely to prove superior to the IG on one of our criteria: the reliability of matching 
scores. Mair & Boyd (1967) report that the ‘split half’ RG format (using 20 photographs) and a 
‘rank-order’ format (using ten photographs) provide a similar degree of consistency in matching 
scores for 16 year old malés, with coefficients in the range 0-43-0-72. Furthermore, in a general 
review of the reliability of RG matching scores, Bannister & Mair (1968, p. 160) conclude ‘It can 
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be said that using elements such as people known personally to the subject, with supplied 
constructs of a conventional type and with either a rank-order or split-half matching 
administration, normal subjects, doing repeat grids, on either the same or different elements, 
tend to yield coefficients of reliability which fall largely within the range 0-6 to 0-8’. Thus, 
average RG retest coefficients based on adults’ ratings of known or unknown others, fail to 
exceed the average coefficient reported for the 8x8 IG employed in this experiment. 
Furthermore, a bigger IG is likely to boost the reliability coefficients still higher. 

It must also be acknowledged that the choice of children as subjects for this experiment is 
unusual in the context of general RG work; most studies have used adults as subjects. However, 
there is no readily apparent reason why a replication of the present study, using older subjects, 
should not provide a similar pattern of IG and RG matching scores. 

The discussion has raised a number of general issues concerning the relative utility of the IG 
and the RG. This is important because repertory grids of all forms are increasingly in use and 
comparisons with alternative techniques are therefore needed badly. The data presented in this 
study, and the potential of the IG to index ‘contradiction’ in construing and construct 
‘superordinacy’, suggests that the IG deserves careful consideration for use in studies concerned 


only with construct relationships. 
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Imagery bizarreness in children’s recall of sentences 


R. Merry and N. C. Graham 





Bizarre mental images have been advocated as ‘artificial’ memory aids for students since the time of the 
ancient Greek teachers of rhetoric, yet the few recent experiments in this area, mostly on adults, have 
generally found bizarreness to be unimportant. In this experiment, 108 {2-year-olds recalled words from 
sentences they had rated as producing bizarre images significantly better than they recalled the same words 
from sentences rated as producing ordinary images. This was true for both expected and unexpected 
immediate recall, and for unexpected long-term recall (P< 0-01 in all cases). A tentative explanation is 
offered in terms of a cognitive approach to perception itself, suggesting that bizarre images might have 
properties similar to Berlyne’s ‘collative’ stimulus variables, involving subjects at something close to ‘the 
optimum level of arousal’. 


Ancient Greek teachers of rhetoric were the first known advocates of mental images as powerful 
systematic memory aids for their students, and their advice has been passed on over the 
centuries (Yates, 1966). The recent re-emergence of imagery as a respectable subject for study 
has gone a long way towards establishing its nature and importance as a mediation process (e.g. 
Paivio, 1971). The ‘image-evoking value’ of a word is now widely accepted as a very influential 
factor in its recall. But are all images equally useful in this respect, or might some be better than 
others? The ancients were in no doubt that the best images were bizarre rather than 
commonplace, and modern writers have continued to claim that memory images should be ‘as 
vivid, striking and fantastic as possible. . . far-fetched, exaggerated and distinctive’ (Hunter, 
1964), ‘the more bizarre and unlikely the better’ (Gombrich, 1972), or ‘exaggerated (e.g. 
grotesquely large), absurd, ridiculous. . .imaginative in any way’ (Buzan, 1974). 

Yet in spite of all this advice, there has been only a little actual research, almost always using 
adult subjects, on bizarreness in images, and virtually no evidence that it is a significant factor in 
recall. One of the few positive findings was by Delin (1968), who asked students to write down 
bizarre images to link pairs of nouns. Those who did so remembered significantly more than 
those who produced more ordinary images, when recall was tested 15 weeks later. 

Most other experiments have actually denied the importance of bizarreness. Wood (1967) 
found no significant difference in recall for subjects asked to form ‘common’ images and those 
asked to form ‘bizarre’ images to link noun pairs. However, some subjects had been expected to 
link pairs like ‘sugar—platter’ with a bizarre image, while others had to relate ‘diaper—coal’ with 
a commonplace one. In such cases, it may have been very difficult for subjects to comply with 
instructions; a similar criticism can be made about a study by Collyer, Jonides & Bevan (1972), 
who claimed that bizarre images were inferior to plausible ones. Moreover, Paivio & Yuille 
(1969) showed that subjects follow mediation instructions only if they are appropriate to the 
materials, and appear to facilitate recall. 

Nappe & Wollen (1973) tried to avoid this problem by using two different lists of word pairs, 
with instructions to form the desired type of image, and again found no difference in recall. 

Since they also found that bizarre images took longer to form than commonplace ones (mean 
5-98 sec as opposed to 3-94 sec), they concluded that bizarre images are less efficient and should 
be discouraged. 

Reese (1970) used simple sentences, and avoided using different sets of words by changing 
only the verb. Thus ‘The cat is carrying the umbrella’ and ‘The chicken is carrying the flag’ Pagi a ` 
describe bizarre interactions, yet ‘The cat has the umbrella’ and ‘The chicken has the flag’ are f 2 aes) 
supposedly commonplace. The finding of no significant difference in recall does not entirely rule et ied 
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out the possible value of bizarreness. 4 PE 
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Finally, Wollen, Weber & Lowry (1972) make the important point that many previous 
experiments could have been confusing the two attributes of bizarreness and relatedness or 
interaction (e.g. Perensky & Senter, 1970). For example, subjects instructed only to form a 
bizarre image of ‘sofa~elephant’ would probably incorporate interaction (e.g. an elephant sitting 
on a sofa). The same image would probably result if subjects were instead instructed to form an 
interacting image. The authors attempted to distinguish between the two with pictures of pairs of 
objects, shown either as interacting or non-interacting and bizarre or non-bizarre. They 
concluded that bizarreness along was not an effective variable, and that bizarre pictures 
facilitated learning only to the extent that they also depicted interaction. However, as Nappe & 
Wollen (1973) point out, it is dangerous to equate pictures produced by the experimenter with 
images produced by the subjects. 

In spite of possible criticisms, several experiments have thus disagreed with Delin and have 
failed to support the mnemonic rule that remained unchallenged for centuries. Paivio’s (1971) 
conclusion still seems largely true: 


The contrast between their results and Delin’s leaves the role of bizarreness in some doubt. Since Delin used 
a much longer retention interval than the other investigators, it is possible that bizarreness 1s especially 
important for long term retention. Further research 1s needed to resolve the issue, but the evidence to date 
does not indicate that bizarreness is a potent determinant of the effectiveness of mediating :mages. 


Thus several issues remain undecided, and any experiment designed to investigate them must 
take four considerations into account. 

(1) The use of instructional set alone is not entirely satisfactory (cf. Paivio & Yuille, 1969), 
and stimulus attributes probably offer a better independent variable (Paivio, 1971). 

(2) However, it is equally unsatisfactory to use entirely different words for the ‘bizarre’ and 
‘ordinary’ conditions, and the relatedness of the items must be separated from bizarreness as an 
attribute (Wollen et al. 1972). 

(3) Use of any form of mediation can only be inferred, but asking subjects to describe or rate 
their images makes such inferences a little safer. Moreover, adult subjects in particular may be 
tempted to use other methods too (e.g. verbal rehearsal) if a recall task is expected (Anderson & 
Kulhavy, 1972; Nappe & Wollen, 1973). Thus, an image-rating task followed by unexpected 
recall offers extra grounds for inferring that subjects really did use images, and for trying to 
distinguish between different kinds of image. 

(4) Delin’s study was the only one to measure delayed recall, and the only one with positive 
results. Long-term and immediate recall could therefore both be assessed. No previous 
experiment on bizarreness has involved both, and Paivio’s suggestion about the possible value of 
bizarreness in delayed recall remains unexamined. 


Method 


This investigation asked 12- and 13-year-old subjects to rate short sentences for their image producing value, 
with and without warning about subsequent recall. After an interference task during a retention interval 
subjects were tested for retention under different conditions. A control set of ‘abstract’ sentences was also 
used for purposes of comparison. 


Design 
A two-way factorial design with repeated measures on one factor was employed with three different kinds of 
sentence (bizarre, normal, and abstract) and three conditions of recall (immediate recall with and without 
instructions during the acquisition phase, and delayed recall after one week). All subjects performed the 
rating task with all the sentences but were randomly allocated between the three recall conditions. 

The specific effects of particular words were balanced out by using two sets of sentences in which the 
words appeared in both normal and bizarre sentences (see below). The same abstract sentences appeared in 
both sets of material. 
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Subjects 


The subjects were 108 12- and 13-year-olds, the whole second-year group of a local mixed comprehensive 
school. They were randomly divided into six groups of 18. Schonell IQ test scores showed no significant 
differences in intelligence between the groups. 


Materials 


Two lists of 18 sentences, each containing a single subject noun, verb and object noun were devised. There 
were three types of sentence (constructed on the basis of a pilot study using about 50 11-year-olds from the 
same school) as follows: 


Six ‘normal’ sentences containing two nouns of high imagery value, related in an ordinary way, e.g. 


‘The man smoked a cigar’. 
‘The hen pecked the worm’. 


Six ‘bizarre’ sentences formed by using exactly the same words throughout, but rearranged in unhkely or 
ridiculous combinations. The two ‘normal’ sentences above thus become 


‘The man pecked the worm’. 
‘The hen smoked a cigar’. 


Control for relatedness and specific properties of individual words was thus achieved. Two lists of sentences 
were used. The two normal sentences above appeared in one list, the two bizarre ones in the other. Both 
lists thus had six sentences of each type. 


Six ‘abstract’ sentences, containing words of low imagery value but similar Thorndike-Lorge frequency to 
the high imagery words, were also included in each list, mainly to provide a base-line and as an extra check 
that imagery was being used, e.g. 


‘The idea changed the rule’. 
‘The cost upset the plan’. 


Thus, one of the lists read as follows. The dog wrote on the blackboard. The man smoked a cigar. The 
doctor lived ın a pond. The others knew nothing. The soldier waved the flag. The teacher lived in a kennel. 
The cat licked the kitten. The idea changed the rule. The headmaster read the newspaper. The answer 
pleased somebody. The horse drove the car. The cost upset the plan. The monkey climbed up a tree The 
excuse surprised them. The policeman ate the hay. The promise caused hopes. The hen pecked the worm. 
The fish spoke on the telephone. 


Procedure 


The initial task for each subject was to rate the sentences for their ‘image value’. In effect the task 
performed the dual function of checking on the normality or bizarreness of the sentences and at the same 
time constituting an incidental learning task. 

Subjects in groups and in their own classrooms were asked to ‘picture in your mind’ each sentence 
presented aurally and to rate the image produced as ‘ordinary’, ‘unlikely’ or ‘don’t know’. A fourth 
category — ‘can’t image’ - was also provided. Supplemented descriptions and prior practice were given as 
follows: 


‘Ordinary’ — ‘the sort of thing you might really see’. 

‘Unlikely’ ~ ‘ridiculous, the sort of thing you’d never really see, even though you can form a picture of 
it in your mind’. 

‘Don’t know’ whether the image is ordinary or unlikely but can form a picture. 

‘Can't image’ - very difficult or impossible to form a picture at all. 


Subjects were instructed to rate their images rather than the verbal meaning of the sentence. The 
sentences were read aloud in as flat a voice as possible, at the rate of about two words per second, with an 
8 sec gap between sentences. After rating the sentences subjects performed an interference task — counting 
letter e’s in newsprint for 1 min. This was given to reduce recency effect (cf. Glanzer & Cunitz, 1966). 
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Recall. There were three recall conditions, with one group using each list for each condition. In each case, 
subjects were asked simply to write down anything at all that they could remember. 


Condition I. Expected recall. Subjects were given standard free recall instructions before presentation and 
rating. They were reminded of these after the interference task to allow for the time needed to instruct group 
TI below. Unexpected free recall was tested one week later. 


Condition II. Unexpected recall. Subjects were given standard free recall instructions, but only after the 
interference task. Unexpected free recall was tested one week later. 


Condition II. No recall task was given after the interference task. Unexpected free recall was tested one 
week later. 


Results 
Rating results 


The ratings had been used largely to ensure that subjects really were using appropriate imagery, 
and to provide a convincing task for groups not expecting recall. As the pilot study had 
suggested, the sentences were very largely rated ‘correctly’ (chi-squared significant at the 0-1 per 
cent level). 


Recall results 


One point was scored for each noun recalled, whether in context or not. Table 1 shows the 
combined groups’ mean score for each sentence type and condition of immediate recall. 
Maximum possible 12 — two nouns in each of six sentences, with 36 subjects in each group. 


Table 1. Combined groups mean scores on immediate recall of a list containing normal, bizarre 
and abstract sentences 


Normal Bizarre Abstract 
sentences sentences sentences 


I. Expected recall groups 5-6 8-1 2:0 
II. Unexpected recall groups 41 73 1-8 


Two-way analysis of variance showed that the nouns in sentences rated as producing bizarre 
images were recalled significantly better than the same nouns in sentences rated as producing 
ordinary images (P< 0-01) in both expected and unexpected recall. 


Table 2. Immediate recall 


Source d.f. MS F P 

Variable A I 39 8-1 <0-01 
(recall condition) 

Variable B 2 609-5 156-3 <0-01 
(sentence type) 

AXB 2 75 1-9 n.s. 


The means were further analysed by Tukey’s procedure (as described by Edwards, 1967). The 
difference between the two recall conditions were significant for normal sentences (P < 0-01) but 
not for bizarre ones. One conclusion could be that bizarre images were particularly useful in 
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unexpected recall, a claim previously made for images in general (e.g. Bower, 1972; Lesgold & 
Goldman, 1973). Finally, the abstract sentences were worst recalled under both conditions, 
lending some strength to the inference that subjects really were trying to form images. 


Table 3. Combined groups mean scores, long-term recall of normal, bizarre and abstract 
sentences 





Normal Bizarre Abstract 
sentences sentences sentences 





I. Previous expected 4-0 8-4 1:5 
recall groups 

II. Previous unexpected 3-1 6-5 0-8 
recall groups 

II. Groups with no 0:7 2-8 0-2 


previous recall 





Table 3 shows long-term unexpected free recall scores for all groups, one week after 
presentation (maximum possible 12). Two-way analysis of variance of those results (Table 4) 
showed that, as before, the same nouns were recalled significantly better in sentences rated as 
producing bizarre images than in sentences rated as producing normal images (P < 0-01) under all 
conditions. The difference between normal and abstract sentences is significant under conditions 
I and H (P< 0-01) but not under condition I (P < 0-05). 


Table 4. Delayed recall 


Source d.f. MS F P 

Variable A 2 319-4 67-2 <0-01 
(recall condition) 

Variable B 2 723-3 267-9 <0-01 
(sentence type) 

AXB 4 44-2 16-4 < 0-01 

Discussion 


As always, use of imagery can only be inferred, but the use of explicit, detailed instructions and 
the rating task, coupled with the significantly poorer recall of abstract items, suggest that 
subjects really were forming images. If so, the results strongly support the value of imagery 
bizarreness in immediate as well as long-term recall. There is also the possibility, based on 
immediate recall results under condition II and long-term recall results particularly under 
condition IJI, that bizarreness could be especially effective when recall is not expected. 
Certainly, combined with previous instructions and practice, bizarre images reduced forgetting 
over a period of one week to zero. (In fact, there was a slight, but non-significant, increase in 
recall under condition I, from 8.1 to 8.4.) 

Disagreement with other studies could be explained by the fact that this experiment used 
children — most previous ones have used adult students. Even so, why should bizarre images 
have been apparently more effective than common ones? One possibility is that bizarre 
sentences or images are less likely to have occurred in the subject’s past experience. In 
behaviouristic terms, lack of interference due to low frequency of association could thus be 
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useful in the recall task. Lesgold & Goldman (1973) found that imagery uniqueness was a critical 
factor in mediation, and suggested that the bizarreness advocated by the ancients could have 
been ‘an approximation to encoding uniqueness’. 

Motivation or ‘subject interest’ have been offered as another explanation for the facilitation 
provided by imagery in general (e.g. McNulty, 1966), without really showing why this should be 
so. Such explanations are thus only partial (cf. Bower, 1972), but form a basis for suggesting 
why bizarre images should apparently be even more effective than ordinary ones. The bizarre 
sentences were generated in a way similar to that used by Miller and his associates in work on 
anomalous sentences (e.g. Miller & Isard, 1963). Such sentences were formed from groups of 
normal ones to make strings like ‘The sticky young rhythm ate wonders’, which violate semantic 
rules but not syntactic ones. These are generally harder to process because they cannot be 
related to the subject’s experience through the stored meanings of the words. In contrast with 
the bizarre sentences, however, they contain a whole series of anomalies; the bizarre sentences, 
each formed from only two normal ones, generally contained only one discrepancy, which the 
subjects were able to resolve by use of a bizarre image. In other words, the bizarre sentences 
perhaps involved a certain amount of ‘cognitive dissonance’ by presenting the subject with a 
‘mis-match’ to be resolved, but without making the task virtually impossible. 

Paivio (1974) has recently argued against discussing images in terms of metaphors based on 
existing technology, and prefers to compare imagery with actual perception rather than wax 
tablets, tape recorders or computers. In this case, the link with perception could be the work of 
Berlyne on ‘collative’ stimulus variables, particularly novelty, incongruity and incongruous 
juxtaposition (e.g. Berlyne, 1966). The formation of images for bizarre sentences, unlike either 
normal or anomalous ones, might involve subjects at something close to the ‘optimum level of 
arousal’, and thus result in better recall. An explanation of the facilitation due to bizarreness in 
mental images can thus be offered within the terms of a cognitive approach to perception itself; 
but even if this explanation is unsatisfactory, the results still strongly suggest that the ancient 
Greek teachers were intuitively right in advocating the use of bizarre images to help their pupils 


over 2000 years ago. 
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Monitoring the effects of increasing resource demands on the clarity of 
visual images 


ta R. Beech and Julian C. Leslie 


This paper examines the changes in the clarity of images over time under conditions varying in resource 
demands. In the experiments, subjects listened to descriptions of arrays of objects and moved a lever 
according to their assessment of the clarity of their visual images of the whole array. The position of the 
lever was sampled every half second by computer. The rated clarity of visual image (Y) increased over time 
(X) and fitted well to the power function Y= aX° for arrays of one to five objects. The increases in set size 
produced increasing demands on the resources of the limited capacity visual imagery system leading to an 
exponential-like decline in the final rating values across set size. In the first two experiments, increasing 
presentation rate produced a differentiation in the slopes (a) of the power functions fitted to each set size, 
because visualization is more difficult at the faster rate. As it is relatively more difficult to visualize larger 
arrays at the faster rate, a relativity hypothesis suggests that visualizing smaller arrays is relatively easier in 
the fast condition, producing the differentiation in slopes. The last two experiments further confirmed the 
relativity hypothesis and correspondingly falsified an alternative hypothesis predicting that images decaying 
over time produced these findings; for instance, in the last experiment, the slope (a) and exponent (b) of the 
power function were higher under the slow presentation rate indicating that this condition was easier. This 
was further confirmed by verbal reports and by the final rating values. The image decay hypothesis predicted 
the contrary result as the possibility of image decay was reduced in the fast condition. These results also 
elucidate previous experiments that demonstrated an increase in visualization latencies as a function of set 
size. In such experiments latencies increase as a function of set size because the clarity of the image is 
slower to reach criterion as the number of objects to be visualized increases. 


The way in which a visual image develops over time has received little attention so far. The 
present paper investigates the changes over time in the quality of an image which consists of 
varying numbers of objects. Consider when we visualize a scene (e.g. a scene described in a 
radio play), we become aware of several of its components simultaneously. In this situation the 
capacity for visualizing several components at once can be described as parallel and there are 
probably constraints on the number of these components that can be viewed at the same time. 
Furthermore if the visual imagery system has limited resources then the time in which it takes to 
visualize all the components might be a function of their number, alternatively the relative 
quality of the image might reduce with an increase in the components, or both these outcomes 
might occur together. Another question concerning temporal changes in the clarity of images is 
whether images suddenly occur or whether they increase gradually. All these problems can be 
investigated by simply asking subjects to make assessments over time concerning the quality of 
their images under conditions of varying demands on the resources of their visual imagery 
systems. 

A study of the way in which a visual image develops over time could provide an explanation 
for the nature of the results of recent experiments examining image latencies. In these 
experiments the subject typically responds after the description of one or more concrete nouns 
when he has produced an image of all of them. Thus a response is required on the basis of the 
appearance of an image (how it looks). The image may increase in clarity until it is of sufficient 
quality for a response to be made which would elucidate recent findings that when subjects 
visualize several objects together their latencies increase as a function of the number of objects 
being visualized (McGlynn & Gordon, 1973; McGlynn, Hofius & Watulak, 1974). Suppose that in 
these experiments the image of the object or objects increases gradually over time. The subject 
could have a criterion point at which the image is sufficiently clear for a response to be made. 
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As the number of objects to imagine wholistically increases it may take longer to reach this 
criterion. This would produce the longer latencies with larger array sizes. 

Changes in the clarity of an image over time are examined in the present study by means of a 
lever moving technique. This involves the subject moving the lever of the monitoring device for 
a fixed period of time. While he is performing this task a computer concurrently samples the 
position of the lever at specified time intervals. The lever position represents a point along a 
bipolar continuum which is his rating of the clarity of the image and the extreme left- and 
right-hand positions of the lever represent the extreme points of this continuum. ore, 

Ratings of the clarity of a visual image have been studied by Pear & Cohen (1971) =`’ (oe Be 
subjects one or two anımal names to visualize and found that they rated the image oft "" 
animals as lower in clarity than the image of one animal; they also found that visualization 
latencies were longer for two animals. McGlynn & Gordon (1973) using more subjects and one, 
two or three animal names confirmed these results with mean ratings of clarity of 86, 78 and 72, 
along a 100-point scale, and visualization latencies of 11, 15 and 18 sec for one, two or three 
animal names respectively. However these relatively long ‘latencies’ included the time taken to 
visualize the animals and then to make a clarity rating. Consequently, the subjects were actually 
making the rating for three animals approximately 7 sec later than when only one animal name 
was presented, assuming that the time taken to make a clarity rating was constant. McGlynn et 
al. (1974) improved the study by measuring visualization latencies and clarity ratings in two 
separate experiments. In the second experiment, the ratings were collected 7-6 sec after the end 
of the presentation of the stimulus because 7-6 sec was the average of all the latencies in the first 
experiment. The ratings were 85, 74 and 65 and the latencies were 5, 8 and 10 sec for one, two 
and three animal names respectively. The differences in the clarity ratings were more marked 
this time, probably because the ratings were all made after the same length of time. This 
suggests that the ratings of the clarity of images may be changing over time and that if more time 
had been allowed to visualize three animals the clarity ratings for three animals would have 
increased. 

The possible changes in clarity that may occur over time were monitored directly in the first 
experiment by requiring subjects to move a lever in accord with their subjective impression of 
the clarity of visualized objects in the period of time immediately after they have been 
described. 





Experiment I 
Method 


Subjects. The subjects were 16 undergraduates, half were male. 


Materials and apparatus. Auditory descriptions of 50 arrays of 1, 2, 3, 4 or 5 objects in a unified spatial 
arrangement were recorded on magnetic tape at the rate of 2-5 sec/object. The objects had to be imagined in 
square ‘pigeon holes’, in order to minimize differences in the sizes of the images (but an actual matrix was 
not provided). The first object was located in a pigeon hole labelled ‘base’. For example, one of the 
descriptions was ‘base egg, down jelly, down star, left purse’. The concrete words varied from three to five 
letters in length. 

A Revox A77 tape-recorder was used that stopped automatically when a transparent piece of tape was 
encountered. At the point where the array description finished on the magnetic tape, a 100 msec tone of 400 
Hz was spliced on to the end of the last word followed by a ‘leader’ tape and then the transparent tape. 
When the tone passed over the playback head of the tape-recorder, the tape-recorder automatically stopped. 
A photoelectric cell was situated on the tape recorder to signal to the computer (see below) when the 
passage had finished. 

The apparatus for monitoring the clarity of images consisted of a wooden box 27X23X9 cm in size with 
5-5 cm of a lever protruding from the top of the box. The lever, total length 19 cm, was pivoted at the centre 
of the base of the box and could move through an angle of 38° along a straight slot in the top of the box. 
The lever was calibrated by placing a linear scale from one to ten along the length of the slot and putting 


Monitoring image clarity 325 


contacts inside the box corresponding to each point on the scale, the pivot at the base of the lever being 
used as the common contact. Once this had been done the scale markings were removed. 

This monitoring device was wired up ta a remote terminal of an on-line computer control system which 
utilized a Nova 2/10 minicomputer (Data General Corporation) and ACT-N programming language 
(Millenson, 1973). Each terminal i in this system has six response bits or sense lines. If the lever is moved 
across the scale, three of the response bits are triggered in cyclic succession. This sequence of activations is 

‘onitored continuously by a software routine to determine the current lever position. Every half a second 

‘trent position at that moment is stored in an array, from which the data are output later. A fourth 
le : bit serves as an anchor position because it is only activated when the lever is in the extreme left 
hanu, © o fy The programme is initiated by a signal from the photoelectric cell on the tape-recorder, and 
samph s nyeve}r position 20 times. Immediately after this a buzzer sounds for 3 sec. There was a lag of 420 
msec between the transparent tape passing over the photoelectric cell and the tone (which signalled the end 
of the description) passing over the play-back head. It was the responsibility of the subject to move the lever 
back to the starting position. As the first sample of the lever position occurred before the onset of the tone, 
this ensured that the starting position of the handle was at the extreme left. This reading was ignored for the 
purpose of the analysis. The second sample (half a second later) was 80 msec after the end of the tone. 


Procedure. The subjects were instructed that they were going to listen to descriptions of arrays of objects 
which they should try to visualize as a whole. A short tone indicated the end of each description, which was 
relayed to them through headphones. Their task was to move the lever in accordance with their rating of the 
clarity of the image of all the objects (or the one object). If the lever was moved to the extreme right then 
this indicated that their experience of an image of all the objects had achieved maximum clarity. They 
continued this task for 9 sec after the tone ending the description; at this point a buzzer came on for 3 sec. 
This was the signal to return the lever to its starting position on the extreme left. The tape-recorder was then 
started for the next array description. Each subject bad a single session in which all 50 different array 
descriptions were presented 


Results and discussion 


The monitoring of the clarity of image ratings over time has produced results which are highly 
consistent for both the males and females. These ratings for 1, 2, 3, 4 and 5 objects as a 
function of time since the end of the description are illustrated in Fig. 1(A). This figure shows 
how the function is linear for the smaller set sizes and becomes increasingly curvilinear and less 
steep with larger set sizes. In all conditions, the relationship between the judged clarity of an 
image (Y) and the time from the end of the description (X) fits the power function Y= aX° 
extremely well. Linear regressions on the linear function log Y = log a+ blog X computed on the 
means of the log transformed data produce correlation coefficients of 1-00 for set sizes 1, 2 and 3 
for both males and females and coefficients of 0-99 and 0-98 for 4 and 5 objects for the males and 
0-98 for 4 and 5 objects for the females. A summary is given in the first row of Table 1. Figure 2 
illustrates changes with set size of the slope a (panel A) and the exponent b (panel B) of the 
power function Y = aX®, The untransformed ratings become more curvilinear and slightly reduce 
in steepness with set size and this is reflected in an exponential-like decline in the exponent, b, 
from 1-0 while the slope, a, remains constant. As there is little change in a or b for 1, 2 or 3 
objects, this means that the clarity of the whole image changes in similar fashion over time 
whether 1, 2 or 3 objects are being visualized. 

The standard deviations of the clarity ratings for all set sizes start low, but within 1-5 sec 
reach about 1-1 rating units. When only one object is visualized, there is a decline in the 
standard deviations over the last 2 sec to 0-6 and 0-8 units, for the males and females 
respectively, because many subjects are giving a maximum response during this end period. 

The experiment eliminates the possibility that the experience of an image suddenly occurs 
some time after the end of the description. Inspection of the data from each trial of each subject 
reveals that in most cases there are smooth increases in clarity ratings immediately after the 
description. So images seem to increase gradually in clarity over time. Furthermore, these 
results explain why visualization latencies have been found to increase as a function of set size. 
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Figure 1. Mean clarity ratings (lever positions) of visual images as a function of time from the end of the 
description and set size for Expts I-IV. 


In such experiments the subject experiences an increase in clarity over time in accordance with 
the power function found in the present study. He must have a criterion point on this clarity 
dimension at which the image is sufficiently clear for him to make a response. Beech (1977) using ` 
similar array descriptions to those used in the present experiment, found that visualization 
latencies varied from under 1 sec for one object to 2-2 sec for five objects. If one assumes that 
subjects in that experiment experienced a similar pattern of increases in image clarity as that 
found in the present experiment, the app “ate criterion point on the clarity scale may be set 
(in this case about 1-8 rating units) to produce latency predictions approximately similar in 
trend to those actually found by Beech (1977). 

The ratings of clarity by the end of the sampling period show a decline as a function of the 
number of objects to be visualized, which supports the findings of Pear & Cohen (1971) and 
McGlynn and others (McGlynn & Gordon, 1973; McGlynn et al. 1974). This decline in clarity 
with increased set size would be expected in a limited capacity system. As more objects are 
visualized, this would reduce the resources available in the visual system leading to a smooth 
degradation in performance. This process is called ‘graceful degradation’ by Norman & Bobrow 
(1975). It would seem reduction in clarity increases with set size and there is an exponential-like 
decline in the exponent b. 

An alternative explanation for the decline in b and final rating values might be that there is a 
confounding between set size and the time from the onset of the description to the start of the 
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Figure 2. Slopes (panel A) and exponents (panel B) of the power functions and final clarity ratings of visual 
images (panel C) as a function of set size for Expts I-III. 


clarity ratings. Thus, for one object clarity rating starts immediately, but for five objects this 
time difference between the description of the first and last objects is over 10 sec. Perhaps the 
differences in final values as a function of set size that are produced by the end of the rating 
period are due to the increasing probability of image decay as a function of set size. In Expt. II 
the likelihood of image decay is reduced by decreasing the time difference between the first and 
last objects to about 4 sec. This is achieved by simply increasing the presentation rate from 2-5 
sec/object to 1 sec/object. The expectation is that if image decay has been a contributory factor 
to the change in the value of the exponent b, and the final rating values, then a reduction in the 
potential contribution of image decay should reduct’ >? even eliminate these changes in b and 
the final rating values. 


Experiment II 
Method 


Subjects. The subjects were 24 undergraduates, half were female. One additional male was tested as one 
male subject reported that he was unable to generate images. 


Procedure. The same materials and apparatus were used as before except that the presentation rate was 1 5 - 


sec/object and 25 arrays of 1, 2, 3, 4 or 5 objects were given instead of 50. More subjects were run to  .“ si Z ou., 


compensate for this. The procedure was exactly the same as before in all other respects. Re 


at 
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Results and discussion 


The changes in image clarity ratings with time are shown in Fig. 1, panel B. The ratings were 
again fitted to the power function Y = aX° and the values of a and b as a function of set size are 
shown in Fig. 2. The data are a very good fit to the power function, as indicated by the 
correlation coefficients given in Table 1. The comparatively low values of 0-91 and 0-95 for set 
sizes one and two may be because the mean ratings are at asymptote for approximately half the 
sampling period in these cases. 

The main difference from Expt. I is that there is a marked decline in a as a function of set 
size. An increase in the presentation rate has produced a differentiation in the slopes of the 
power functions of the different set sizes. Exponent b is lower overall than in Expt. I indicating 
that all the functions are more curvilinear, this could mainly be because as the functions are 
steeper then asymptote has been reached much earlier. The pattern of results can be described 
as the same pattern as might have been found in Expt. I if in that experiment the sampling 
period had lasted twice as long. 

Unlike Expt. I, there are differences between the sexes in this experiment. Females have 
overall higher exponents (b) than males (except for set size 1), but the decline in a is 
approximately the same for both sexes. Standard deviations of ratings for each set size are 
relatively small for the first sampled time period, but thereafter vary from about 2-1 to a 
maximum of 4-2 for males and 2-0-3-6 for females, computing each of the 19 time samples for 
each set size. This increase in standard deviations may have been because less readings were 
taken in Expt. II. : 

The main result of the experiment is that there is a marked decline in the slope, a, of the 
power functions across set size and an overall decline in the exponent, b, across set size. This 
result may be compared to Expt. I where the presentation rate is slow and the slore a of the 
power function is constant across set size. These two experiments may be interpreted in terms 
of the parallel limited capacity model as indicating that visualization is more difficult at the 
faster presentation rate. The rating scale may be considered as a relative scale where the clarity 
rating of a condition is made in relation to the relative clarity of the other conditions so far 
experienced in an experiment. Comparing across experiments, visualizing five objects described 
quickly is more difficult than visualizing five objects described more slowly. When only one 
object is described, the comparative ease of visualizing one object compared to five objects 
presented quickly, is greater than the comparative ease of visualizing one object compared to 
five objects presented slowly. If this were not the case and rating scales were reflecting absolute 
judgements of clarity, the power function for one object would be identical in both experiments. 
Instead the power function for one object in the fast experiment is much steeper than in the 
experiment with the slower presentation rate. Corroborative evidence that Expt. I is the easier 
condition comes from a study by one of us which examined the limit in the number of objects 
that can be visualized wholistically across various presentation rates. It was found that most 
objects were visualized at a presentation rate of 2-5 sec/object (Beech, 1976). 

The greater differentiation in the slopes of the power functions with increased presentation 
rate may be a mathematical artifact; a might have been produced because the zero intercept (log 
a) of the transformed function is made artificially higher as a result of the very long periods at 
asymptote in Expt. II. This possibility was tested by takirg just the first ten values before 
asymptote is reached and computing linear regressions which had the effect of reducing a from 
2-58 to 2-31 for five objects but only reducing a from 1-64 to 1-63 for one object. So although 
the long asymptote does produce some change in the parameter a, this produces only a slight 
change in the slope of the function in Fig. 2(A). 

The image decay hypothesis predicted a reduction in the differentiation of the clarity of the set 
sizes in Expt. II, but the opposite result has occurred: differentiation between the set sizes has 
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increased. However it might be argued that the fast presentation rate makes visualization easier 
and not more difficult - in spite of the evidence of Beech (1976). On this argument reducing the 
probability of image decay has made the objects more easy to visualize and consequently the 
ratings increase more rapidly. However the image decay hypothesis can make no prediction 
concerning the outcome of the next experiment; because in Expt. III the task is made easier by 
removing the spatial descriptors between the objects, but the presentation rate is maintained at 
the same fast rate as in Expt. II. On the other hand, the relativity hypothesis predicts that as the 
task is easier, there should be a reduction in the slope of the linear function plotted in Fig. 2(A). 
In other words, the slopes of the power functions should become less differentiated compared to 
the slopes of Expt. II. 


Experiment II 
Method 
Subjects. The subjects were 24 undergraduates, half were male. 


Procedure. The objects have been previously described as being imagined in square pigeon holes. This was 
intended to control differences in the size of images especially as Kosslyn (1975) had demonstrated that 
image latencies are influenced by the relative size of images. As the spatial descriptions within pigeon holes 
were no longer given, a set of relatively small objects was used within 25 arrays. The presentation rate was 
1 sec/object. 


Results and discussion 


Figure 1(C) illustrates the mean ratings of image clarity. The pattern of results is broadly similar 
to Expt. H, where the same presentation rate was used but the final clarity values at the end of 

the sampling period are less differentiated. The correlation coefficients produced by fitting linear 
regressions to the power function are shown in Table 1. As in Expt. II the correlations are 


Table 1. Correlations based on regressions on the function log Y= log a+b log X 


Set size 

1 2 3 4 5 
Expt. I 1-00 1-00 1-00 0-98 0-98 
Expt. I 0-91 0-95 0-99 0-99 0-99 
Expt. I 0-95 0-97 0-98 0-99 0-91 


t 


slightly reduced to 0-95 and 0-97 for set sizes one and two respectively. The decline in the slope, 
a of the power function across set size is less marked that Expt. I as illustrated in Fig. 2, panel 
A. This provides confirmation of the relativity hypothesis because there is less differentiation 
between the slopes of the power functions across set sizes as the task becomes easier. Similarly 
the curvilinearity of the function has reduced, this is demonstrated by higher values of the slope, 
b, of the power function (see Fig. 2(B)); this further demonstrates that the task is easier and not 
more difficult as predicted by the image decay hypothesis. On the other hand the image decay 
hypothesis in any form cannot take any account of the differences in the results as no 
manipulation of this hypothesis has taken place. Panel C of Fig. 2 summarizes the final clarity 
ratings of the objects after the sampling period for the last three experiments. In all cases there 
is an exponential-like reduction in clarity with set size in spite of such manipulations as varying 
` the presentation rate and omitting to specify spatial relationships. This indicates that the 
subjective impression of the clarity of the whole image as a function of set size, is relatively 
robust after an appropriate length of time has elapsed but the intervening period varies in quite a 


~ 
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lawful way according to the kinds of manipulations made on the presentation of the 
description. 

As in Expt. II there are differences between the sexes in the present experiment. The 
exponent (b) is overall higher for the males which is the opposite finding to Expt. II. Also the 
slope of a as a function of set size is lower for the males than for the females. In terms of the 
relativity hypothesis this pattern of results suggests that in Expt. II females were slightly better 
than males but when conceptualizing spatial relationships was not required then males found the 
task substantially easier. In other words, females find conceptualizing spatial relationships easier 
than males. On the other hand, the females may not have coded the spatial relationships to the 
same extent as the males and so are not influenced by the omission of spatial descriptors in 
Expt. II. Previous studies would tend to support this latter interpretation as males-tend to excel 
in spatial ability (Guildford, 1967; Hutt, 1972). The relative changes in performance between the 
sexes in these two experiments cannot be accounted for in terms of differences between the 
groups that were sampled. In Expts II and II subjects were given the Bett’s Questionnaire Upon 
Mental Imagery (Sheehan, 1967). In Expt. If males and females had similar scores, 2-82 and 
2-60, which were not significantly different (t = 0-85, d.f. = 22) and in Expt. HI scores were 2-63 
and 2-38 respectively, again not significantly different (t = 0-76, d.f. = 22). As in Expt. Il, 
standard deviations are generally small for the first sampled time period and thereafter vary 
between 1-4-3-8 for the females and between 1-1 and 4-4 for the males. 

Although the elimination of the spatial relationships made the task slightly easier, the resulting 
change in the pattern of the clarity ratings is not substantial even though the changes are in the 
predicted direction. A better test of the relativity hypothesis would be to have the same subject 
experience both the fast and the slow presentation rates and observe the relative changes in the 
slopes and exponents of the power functions. This test is carried out in Expt. IV. On the basis 
of this comparison subjects can also report which presentation rate is the easier for 
visualization. The relativity hypothesis would predict that objects presented quickly would be 
more difficult and therefore parameter a, the slope of the power function, would be lower and b, 
the exponent, would be lower because the function would have a lower slope and would be more | 
curvilinear. But the image decay hypothesis would make exactly the opposite predictions, the 
fast presentation rate would be easier to visualize as image decay is reduced so that in the fast 
condition the ratings would increase more quickly than for the slow condition. Another 
hypothesis, in this case making a similar prediction to the image decay hypothesis, predicts that 
the steep rise in ratings over time with faster presentation rate is because of the demand 
characteristics of the experiment. Somehow, the fast presentation rate encourages fast modes of 
responding, consequently ratings should increase more quickly after a fast presentation rate, if 
this ‘context effect’ has been playing a major role in the previous experiments. 


Experiment IV 
Method 


Subjects. The subjects were 18 undergraduates, half were female. 


Procedure. Similar materials to Expts I and Il were employed. Fifteen arrays of objects described in pigeon 
holes were constructed. The three conditions comprised of one object, four objects quickly (1 sec/object) or 
four objects slowly (2-5 sec/object) presented. These conditions were presented pseudo-randomly so that 
each condition occurred once every three trials. All the subjects were asked afterwards whether the four 
objects presented quickly or the four objects presented slowly was the easier condition. 


Results and discussion ; 
Figure 1(D) illustrates the changes in image clarity with time. The ratings for four objects 


„ presented slowly rise more quickly and reach a higher final value than four objects presented 
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quickly. Sixteen out of the 18 subjects said the slow condition was easier, one female thought 


, 


Monitoring image clarity 331 


that there was no difference and one male thought the fast was easier when particular spatial 
configurations were employed, otherwise the slow was easier. The correlation coefficients for the 
one object, four object slow and fast conditions are 0-90, 0-97 and 0-96 respectively. The slopes, 
a, of the power function are 3-10, 2-45 and 2-32 respectively and the exponents, b, are 0-605, 
0-625 and 0-561. Thus the power function is steeper for the slow condition compared to the fast 
with a higher a value and it is less curvilinear with a higher b value. This order of difference is 
consistent across the sexes. The difference in the final ratings of the fast and slow conditions 
was compared by f test and found to be significantly different (t = 3-04, d.f. = 34, P< 0-01). 

The standard deviations are small for the first half second and then vary between 1-7 and 4-4 for 
the females and between 1-7 and 3-6 for the males. 

These findings add further support to the relativity hypothesis which proposes that a higher 
slope (a) of the power function indicates that a task is easier relative to other tasks having lower 
slopes with the same experiment. The image decay hypothesis is once more rejected by the 
present experiment. Even though the extent of image decay might be reduced with increases in 
presentation rate all the parameters examined, including verbal reports, indicate that the slow 
condition is easier to perform. Furthermore the ‘context’ hypothesis, which proposes that 
subjects are responding more quickly because of the context of a fast presentation rate must also 
be rejected on the basis of the results of the present experiment, as the fast condition this time 
elicits slower increases in clarity during the sampling period. 

General discussion 

The experiments reported in this paper investigated the changes in image clarity which are 
experienced over time as a function of the number of objects that are visualized. These 
experiments provide evidence that processing within a visual imagery system is of limited 
capacity and parallel. In the first two experiments the relationship found between the judged 
clarity of an image (Y) and the time from the end of the description (X) gives a good fit to the 
power function Y= aX°. In Expt. I an increase in set size produces an exponential-like decline 
in the exponent b with an increase in set size beyond three objects and also leads to a 
concomitant reduction in final value of the clarity ratings. In terms of the parallel limited capacity 
model this suggests that resources become increasingly less available beyond three objects. An 
increase in the presentation rate in the second experiment produces a differentiation in the slopes 
(a) of the power functions fitted to the different set sizes and a concomitant reduction in the 
exponent (b) (increasing the curvilinearity and steepness over time of the untransformed data); 
the latter reduction is because the steep slopes of the functions make contact with the upper 
limit of the scale much earlier than in Expt. I. The differentiation in slopes with a faster 
presentation rate is because the comparative ease of visualizing one object compared to larger 
set sizes presented quickly is greater than the comparative ease of visualizing one object 
compared to larger set sizes presented slowly, according to the parallel limited capacity model. 

The next two experiments further confirmed this interpretation and furthermore discounted an 
alternative hypothesis suggesting that fast passages were easier because the possibility of image 
decay is reduced as the interval between the description of the first and last objects is reduced 
from 10 sec to 4 sec. In Expts I and III ratings increased more rapidly after the fast descriptions 
suggesting that visualization is easier at the faster rate and superficially confirming the image 
decay hypothesis. However in Expt. III the presentation rate was maintained at the same fast 
rate (thereby keeping any image decay effects constant) and the descriptions were made 
comparatively easier by eliminating the spatial descriptors and the slopes of the power functions 
become correspondingly less differentiated compared to the slopes from the first fast experiment ea 
(Expt. II). Similarly, when the same subjects experienced fast and slow descriptions of arrays ORES o 
four objects (Expt. IV) the slope of the power function in the slow condition was higher and the.) 


exponent was higher, in terms of the untransformed data the function is steeper and less ae tsa \ 
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curvilinear, indicating that within the conditions experienced by the same subject, a condition 
with a higher slope is easier relative to the condition with the lower slope. The image decay 
hypothesis would have predicted the contrary result. In addition, the final clarity value for the 
objects described slowly was significantly higher than the objects described quickly and nearly 
all the subjects considered that four objects described slowly were easier to visualize adding 
further evidence against the image decay hypothesis. The fast increases in the clarity ratings 
with the fast presentation rates in the first experiments might also have been produced by a 
context effect so that the fast presentation rate elicited faster modes of responding resulting in 
steeper slopes across time. But in the final experiment the slow presentation rate elicited the 
faster mode of responding excluding this particular explanation. 

In these experiments subjects have been making a subjective assessment of the clarity of their 
images of objects that are described to them. This has required an aesthetic assessment 
concerning the quality of information available for inspection within the mind’s eye. A model 
has been proposed which has provided a consistent interpretation of this series of experiments. 
The visual imagery system is considered as a spatially parallel system (Paivio, 1971) with limited 
resources. When set size is increased this reduces the resources (or power) available to the 
visual imagery system which the subject experiences as a concomitant reduction in the clarity of 
the overall image. Similar proposals of parallel systems have been proposed within other 
experimental paradigms, for instance Corcoran (1971) has suggested that the mechanism involved 
in Sternberg’s search experiments (e.g. Sternberg, 1966) need not be an internal serial 
comparison process, but might be a parallel search in a system having a limited power source; an 
increase in set size would then produce a concomitant reduction in power and slow down search 
times. In visual imagery the resources for visualizing the image of each object may be reduced 
because they also have to be allocated to the other objects. This dispersal of resources makes 
little impact on the clarity of the overall image up to three objects but for larger set sizes the 
reduction seems to be exponential with the exponential-like reduction in the final clarity values 
across set size. Kosslyn & Pomerantz (1977) also suggest that the production of an image has a 
finite processing capacity however their proposals for the nature of this limitation are at variance 
with the findings in this paper. They suggest that the limitation is produced by the rate at which 
images are constructed and the rate at which they decay. By analogy, a bucket leaking near the 
top may never be filled to the brim with water. Similarly if there is too much to visualize then 
portions of the image fade (i.e. ‘leak’) before wholistic visualization is achieved. But in Expt. IV 
of the present paper, the possibility of image decay was reduced (assuming that image decay is 
time dependent) by increasing presentation rate; this was found to make visualization more 
difficult rather than easier. The pattern of clarity values produced in these experiments suggests 
that the clarity of the overall image was either increasing or had reached asymptote over the 
sampled period. Thus the experience of the whole image does not seem to deteriorate at all; 
however this finding of a non-deteriorating image may have been produced by the demand 
characteristics ‘of this particular series of experiments. 
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An imagery mnemonic for the learning of people’s names 


Peter E. Morris, Susan Jones and Peter Hampson 


Since the failure to remember the name of a person to whom one has been introduced can be 
embarrassing, methods of improving the recall of names to faces are desirable. As a means of 
learning names Lorayne (1958) suggests a mnemonic technique the effectiveness of which was 
tested in the present experiment. 

Lorayne’s method involves first converting the name to be retained into an easily imaged form. 
For example, Fishter can be made into fish stir and be imaged as a fish stirring and Gorden can 
become garden. The next step involves choosing a prominent feature of the person’s face, and 
linking the image of the name to it. Thus, if Mr Gorden has a large nose an image could be 
formed of a garden growing over his nose. When recall of the name is required the face should 
recall the image, the image cue the substitute form of the name, and this, in turn leads to a recall 
of the appropriate name. Lorayne maintains that his mnemonic system enables him to perform 
impressive stage demonstrations of memory for names. For example, he reports being able to 
name almost 400 people in 7 min (Lorayne & Lucas, 1976, p. 77). 

The method may seem bizarre, but it incorporates mnemonic techniques which have been 
shown experimentally to be powerful aids to memory in verbal learning experiments (e.g. 

Bower, 1970; Morris & Stevens, 1974). 


Method 


Two sets (A and B) of 13 black and white photographs of adult males were taken from newspapers, avoiding 
well-known individuals. They were selected for uniformity of size, frontal view and naturalness of facial 
expression. Each face was assigned a name, selected randomly from a telephone directory. Each photograph 
was stuck to a card, with the name printed beneath it. For testing recall, the lower part of the card, with the 
name, was folded back, concealing the name from the subject. For the training of the mnemonic group of 
subjects, five examples of the use of the mnemonic were taken from Lorayne (1958, pp. 142-143); along with 
his descriptions of the application of the mnemonic to these examples. Five further face-name pairs were 
taken from Lorayne to provide the subjects with practice at the technique. 

Forty undergraduates at Lancaster University volunteered to act as subjects and were tested individually. 
Twenty formed the mnemonic group, 20 the control group. Both groups were first tested on their ability to 
recall names to faces, using one set of the photographs. In each group, half the subjects were tested on set 
A and half on set B. Each photograph-name pair was displayed to the subject for 10 sec. At the end of the 
list the cards were shuffled and recall of the names was tested, allowing 10 sec for each item. The subjects in 
the control group then participated in an experiment on backward spelling, which took between 5 and 10 
min. Afterwards, they were tested on their learning of the second set of photograph-name pairs. Finally, 
they were questioned on their method of learning the pairs. 

The mnemonic group had the mnemonic method explained to them and read through the five examples 
from Lorayne. They practiced the method on the further five examples from Lorayne, and were then tested 
on the second set of photograph-name pairs. The second testing for both groups followed the same 
procedure as the testing of the first list. 


Results 


The mean recall and standard deviations for the groups on the two tests is-shown in Table 1. An 
analysis of variance was conducted on the number of names correctly recalled by each subject, 
with groups (2) and tests (2) as factors. Both main effects and the interaction were significant 
(P<0-01). Simple main effects were therefore calculated. There was no effect of groups on test 1 
(F= 1-56, d.f. = 1-76, P>0-1) but on test 2 the difference was highly significant (F = 57-18, 

d.f. = 1, 76, P<0-01). The control group did not improve between test 1 and 2 (F = 0-80) while 
the mnemonic group improved significantly (F = 68-35, d.f. = 1, 38, P< 0-01). 
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Table 1. Mean and standard deviations for the recall of names to the photographs 


Test 1 Test 2 


Mnemonic group 


Mean 5-7 10-2 

S.D. 1:7 1-7 
Control group 

Mean 4-9 5:4 

S.D. 2:33 2:2 


When questioned on their method of learning the lists the control subjects reported attempting 
various strategies to link the names to the faces, However, none used the technique taught to 
the control subjects. 


Discussion 


The mnemonic technique considerably improved recall of the names to the photographs. The 
improvement occurred for subjects briefly trained in the technique, indicating that considerable 
practice in the method is not necessary. The lack of a significant improvement in the 
performance of the control group demonstrates that the better performance of the mnemonic 
group on the second test was not simply the result of practice at the task. 

Several reasons for the effectiveness of Lorayne’s mnemonic can be suggested. The power of 
imagery as an integrater of unrelated items, and the better retention of concrete than of abstract 
words are well documented (Paivio, 1971). It is also likely that the analysis of both the name and 
the face required by the mnemonic technique will lead to a deeper and more relevant processing 
of them both, with a resulting improvement in retention (Lockhart, Craik & Jacoby, 1976). 

Unwillingness to undertake the effortful analysis of name and face may prevent the method 
being adopted in situations other than those where good recall of names provides a high 
repayment for the initial investment of effort. However, the results of the present experiment 
suggest that where the investment is made, considerable returns in terms of better recall can be 
expected. 
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Memory effects in visual spatial information processing 


Harold D. Fishbein 





Eight, ten and twelve year old children were tested on a novel procedure involving the successive 
presentation of the standard and comparison stimuli. Two hypotheses were evaluated, one dealing with 
memory effects, and the other with children’s pre-testing of choice responses in spatial information 
processing. It was found, in general, for both spatial perception and coordination of perspectives tasks, that 
there was no short memory decay for spatial information, but that opportunities to pre-test choice responses 
improved performance. It was inferred from these data that the performance superiority under simultaneous 
than successive conditions is attributable to opportunities to pre-test responses and not to memory effects. 
as opposed to successive conditions is attributable to opportunities to pre-test responses and not to memory 
effects. 


Two recent experiments dealing with children’s and adults’ visual perception of space 
(Smothergill, 1973; Fishbein, Decker & Wilcox, 1977) have found performance to be superior 
when the spatial response was made in the presence of the standard stimulus as opposed to when 
responding occurred 5 sec after the removal of the standard. One explanation offered by the 
authors of these studies was that memory for the spatial information may have decayed after the 
standard was removed. However, other recent experiments by Finkel & Smythe (1973), Jones & 
Connolly (1970), Posner (1967) and Smothergill, Hughes, Timmons & Hutko (1975) reported 
essentially no memory loss of visual spatial information following short time intervals after 
removal of the standard. How can these apparently conflicting findings be reconciled? 

One possibility, of course, is that they can’t. Sometimes there is, and sometimes there is not, 
a memory loss. Another possibility is that the performance difference between ‘standard 
present’ and ‘standard absent’ conditions is not brought about by a performance decrement 
(memory loss) in the latter conditions, but rather by a performance increment in the former 
conditions. What might be the cause of an increment? One obvious procedural difference 
between the two conditions which may have psychological consequences is that the presence of 
the standard, relative to its absence, permits different kinds of activities to occur. For example, 
when the standard is present the observer may actively engage in making comparisons between 
his potential responses and the spatial location of the standard. This would permit observers to 
test out and reject various possible choices before making an overt response. When the standard 
is absent these activities are normally precluded. 

The above explanation is not unlike what Tolman (1939) has described as ‘vicarious trial and 
error’ responding. As with Tolman’s hypothesis, there is no obvious direct way to test its 
validity. There is an indirect way, however. The essence of the pre-testing explanation is that if 
observers are given the opportunity to compare possible choices, they will do so, and this will 
aid their performance. Accordingly the following set of novel procedures was adopted. Half of 
the children received the following, visually: the standard, the choice stimulus, the standard, and 
the same choice stimulus again to which they indicated whether or not it was an accurate 
representation of the standard. The other half of the children only saw the choice stimulus once, 
at the end of the sequence. If the presentation of the intervening choice stimulus followed by the 
standard allowed children to pre-test possible decisions, then their performance should be 
superior to that of the children not receiving the intervening choice stimulus. It should be 
emphasized that in the above procedures the standard was always absent when an overt spatial 
response was made. 

In order to establish the generality of results, these procedures were used under two different 
difficulty levels of spatial information processing: perception of the location of three objects; and 
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coordination of perspectives of the location of these objects. Nig] & Fishbein (1974) found 
performance on the latter task to be inferior to that on the perception task for children eight to 
twelve years old. 


Method 
Subjects and design 


The subjects were 48 white boys and 48 white girls from a public school in Cincinnati, Ohio. They were 
divided into three groups of 16 boys and 16 girls on the basis of grade: second grade, with mean age of 7-9 
years; fourth grade, with mean age of 9-7 years; and sixth grade, with mean age of 11-9 years. 

The children were run in a two-between, two-within 3x2x2x2 factorial design. The between-subjects 
factors were age (8, 10 and 12) and number of inspections of the choice stimulus (one vs. two). The 
within-subject variables were amount of delay between the last presentation of the object array and the 
presentation of the choice stimulus (1 sec vs. 15 sec) and the nature of the spatial judgement (coordination of 
perspectives vs. perceptual recognition). 


Materials 


The materials were: an object array consisting of a 21x21 in white piece of cardboard, and three black 2 in 
high blocks — rectangular, cylindrical and triangular in shape — placed near the centre of the cardboard, two 
of the blocks parallel to the vertical sides, and two parallel to the horizontal sides; a 23x 18 in wooden 
screen; a 61n tall fully clothed girl doll; and a 14 in high inverted ‘L’-shaped stand to which the doll was 
attached. 

The choice stimul: consisted of 24 ‘maps’ of the object array. These maps were simular in appearance to 
aerial photographs, but were constructed of 4% in black paper triangles, squares and circles glued near the 
centre of 9x9 in white poster paper. Twelve of the maps were veridical, i.e. they accurately depicted the 
doll’s ‘view’ of the blocks; six were 180° rotations of the doll’s view; and six were left-nght inversions of 
the doll’s view.” 


Procedure 


Each child was run individually by a white female experimenter. During the initial ‘familiarization’ phase of 
a session, the doll-stand was placed at the edge of the object array so that the doll was ‘looking’ down at the 
array 14 in directly above the centre of it. The child was shown a veridical map of the array and was told 
that it was a picture of what the doll sees from where the doll is. The experimenter, child, doll and map 
moved to a different viewing position 90° from the original one. The child was asked if the now non-veridical 
map was a picture of what the doll sees. The procedure was repeated following another 90° move. Corrective 
feedback was given in all cases. 

The experimenter and child then sat down facing the array. This viewing position hereafter will be called 
the ‘frontal’ position. The child was then told that from time to time the doll would be placed at different 
viewing positions of the blocks and that the child’s task was to decide and remember what the doll saw from 
her position. The experimenter noted that when the doll was at the frontal position, the child and doll would 
see the same thing, but when the doll was placed at either the left side of the array (pointing) or across from 
the child (pointing) the doll and child would be seeing different things. The left side and across positions of 
the doll require coordination of perspectives and spatial judgements, and these viewing positions hereafter are 
referred to as ‘non-frontal’. Finally the child was told that his memory for what the doll saw would be tested 
by asking him to tell whether the map he was shown was an accurate picture of the arrangement of the 
blocks. He was cautioned that sometimes the map would be accurate, and sometimes not. 

On each trial the experimenter placed the wooden screen in front of the child, occluding the child’s view 
of the doll and array, and rearranged the blocks and moved the doll. The delay conditions and the ‘one look’ 
and ‘two look’ conditions are represented in Fig. 1. As can be seen from the figure, the map and the object 
array were never presented simultaneously . The only difference between the 1 sec delay and 15 sec delay 
conditions was that for the former, the final (or only) presentation of the map occurred | sec after the 
second presentation of the object array, and for the latter, 15 sec later. The only difference between the one 
look and two look conditions was that in the latter, the map was presented in the time interval between the 
two presentations of the object array. Finally, the dashed line on the figure associated with the final (or 
only) presentation of the map indicates that the map was removed after the child responded. 

Twenty-four trials were run, one-half of which involved veridical maps, and one-half non-veridical maps. 
One-half of each involved the frontal positions, and one-half, non-frontal positions. One-half of each of 
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Two-look conditions 


Object array akk A r ake ea: eae ee eer ee? 
Map (1 sec delay) eee ey 


or 


Mapls ee delay ya ee a 


One-look conditions 


Object array St es ee ee 


Time (sec) 


Figure 1. Schematic representation of the time intervals and durations used of the stimulus materials in each 
trial. The presentation of a stimulus (object array or map) 1s indicated by a rise in the line, and the occlusion 
of a stimulus is indicated by a fall in the line. The dashed lines indicate that the map stays in view until the 
child responds ‘yes’ or ‘no’. 


these involved a 1 sec delay interval between the final appearance of the array and the final appearance of 
the map, and one-half involved a 15 sec delay interval. The order of delay intervals was counterbalanced 
across children. The complete experimental session lasted approximately 30 min. 


Results and discussion 

Table 1 presents the percentage of correct responses, corrected for guessing for all the major 
conditions of the experiment, averaged for boys and girls, whose performance was highly 
similar. As can be seen, there was a progressive decrease in errors with increasing age 

(F= 25-11, d.f. =2, 90, P< 0-01) (the anova was performed on number of errors), performance 
was poorer on the non-frontal trials than on the frontal ones (F= 161-81, d.f. = 1, 90, P<0-01) 
and children who saw the map twice each trial made fewer errors than those who saw the map 


Table 1. Percentage of correct responses as a function of age, position of doll, number of map 
exposures and delay 


Map exposed twice Map exposed once 


Frontal Non-frontal Frontal Non-frontal X 


8 year olds 
1 sec delay 88 56 76 42 65-5 
15 sec. delay 90 58 68 48 66-0 Sei es 
10 year olds Pers ee 
1 sec delay 94 70 86 64 78-5 fe? A ‘4 
15 sec delay 100 70 90 66 80-0 EERE ae i 
12 year olds T- ( oa i, 
Isecdelay 92 74 96 78 85-0 ak, se 
15 sec delay 92 82 94 84 87-0 ee os ee 
X 92:7 68-3 85-0 63-7 ae 
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only once (F=6-44, d.f. = 1, 90, P< 0-05). There was virtually no difference in performance on 
trials run with a 1 sec delay as compared with those run with a 15 sec delay F= 2-08, d.f. =1, 
90, P>0-10). Overall children were 76 per cent correct with 1 sec delays, and 78 per cent 
correct with 15 sec delays. Only two of the two-way interactions and none of the higher order 
interactions reached statistical significance. The age by number of map exposures interaction 

(F = 4-35, d.f. =2, 90, P< 0-05), reflects the observation in Table 1 that eight year olds benefited 
substantially from a second ‘look’, ten year olds benefited less, and twelve year olds, not at all. 
The age by position of doll interaction (F=7-13, d.f. =2, 90, P< 0-01), can be attributed to the 
relatively poorer performance of the eight year olds, compared with ten and twelve year olds, on 
the non-frontal conditions (the coordination of perspectives task). 

The nearly identical performance under the 1 sec delay and 15 sec delay conditions for 
children in all age groups is highly consistent with previously published research indicating that 
there is essentially no memory loss of visual spatial information during short intervals after 
stimulus offset. The present results extend those findings by showing the generality of the 
phenomenon at markedly different levels of cognitive difficulty. 

The hypothesis that opportunities to pre-test choice responses will improve performance was 
only partially supported. It was strongly confirmed for eight year olds, less so for ten year olds, 
and not at alf for twelve year olds. For the latter group, there were clearly ceiling effects on 
frontal trials, but in the coordination of perspectives trials, performance averaged about 80 per 
cent correct. The failure to find an enhancement effect with the twelve year olds is puzzling in 
light of the fact that in both the Smothergill (1973) and Fishbein et al. (1977) studies, the oldest 
subjects performed substantially better under the standard present than standard absent 
conditions. One possible explanation for this ‘failure’ is that the twelve year olds were so 
efficient in extracting the relevant information from the standard stimulus, that a single additional 
opportunity to make a comparison did little. In the usual conditions of the standard being 
continuously present during responding, children may make many comparisons, and perhaps 
with older children it takes many comparisons to enhance performance. 

Another possible explanation not inconsistent with the above, is that the twelve year olds were 
responding asymptotically despite the fact that their performance was not errorless. In a recent 
experiment by Fehr & Fishbein (1976) using the identical stimuli as those of the present 
experiment, twelve year old children were asked to make similarity judgements between maps 
and the continuously present stimulus array. On 91 per cent of frontal trials they judged the 
veridical map as being veridical, and on 80 per cent of coordination of perspectives trials they 
judged the veridical map as veridical. There is obviously a close correspondence between the 
present data and those of Fehr & Fishbein. Although there are important procedural differences 
between the two experiments, the fact that under ‘standard present’ conditions the Fehr & 
Fishbein twelve year olds responded correctly 80 per cent of the time does lend credence to the 
assertion that 80 per cent correct responding in coordination of perspectives may be an upper 
level for children this age. ; 

We now return to our initial question, is there a memory loss for spatial information following 
short time intervals after removal of the standard? The answer is, probably not. The experiments 
which found response differences between standard present and standard absent conditions seem 
to have demonstrated performance increments owing to the presence of the standard, and not 
memory decrements owing to the removal of the standard. Finally, although the focus of this 
paper has been on visual spatial information, it is believed that the present analysis has 
implications for any research which has contrasted performance under standard present and 
standard absent conditions, e.g. ‘simultaneous’ vs. ‘successive’ perception and learning 
experiments. 
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Recall and organization in five year old children 


Graham Davies and Lindsay Brown 


This study is concerned with the actual and potential memorizing strategies available to five year old 
children. Forty-eight children observed 20 objects drawn from five different categories. A factorial design 
was employed with two conditions of presentation (blocked/random) combined with two conditions of recall 
(cued/uncued). Both blocking and cueing produced significant effects upon recall and clustering, blocking 
affecting within-category recall while cueing had its main effect on the number of categories sampled. 
Subjects who received a combination of random presentation and uncued recall showed low levels of 
clustering and little relationship between clustering and recall. It was concluded that five year olds suffer 
from production deficiency rather than mediation deficiency. 


As recent reviews have demonstrated (Herriot, Green & McConkey, 1973; Jablonski, 1974) the 
mechanisms mediating recall in very young children still remain something of a mystery. In 
adults, recall efficiency appears to be largely dependent upon the degree of semantic 
organization the subject is capable of imposing upon the material (e.g. Bartlett, 1932; Mandler, 
1967). However, the evidence for similar processes in young children is much more equivocal. A 
number of studies have examined the development over age of the ability of subjects to recall 
items drawn from common categories. While all report increases in recall with age, many have 
failed to find any concomitant increase in levels of ‘clustering’: a measure of the subject’s 
tendency to recall in terms of constituent categories. This lack of relationship between measures 
of recall and organization appears particularly marked in those studies which have used children 
below nine years of age (e.g. Horowitz, 1969; Nelson, 1969; Cole, Frankel & Sharp, 1971). 
Further, as Herriot et al. point out (p. 25), where positive findings have been reported, these 
have been expressed in terms of overall differences between groups, rather than the more crucial 
within-group correlations. 

Flavell (1970) has argued that the failure of studies to find any simple correlation between 
recall and clustering need not be attributed to any fundamental dichotomy in memory processes 
between adults and children. Rather, they reflect the inability of the young child to use his 
cognitive structures spontaneously and appropriately. These children, Flavell argues, suffer from 
‘production deficiency’ rather than a more fundamental ‘mediational deficiency’. 

Support for Flavell’s views can be gained from those studies which have sought to arouse any 
dormant organizational skills present in the child through the provision of aids, either at the 
time of original presentation (encoding) or at retrieval (decoding). Thus, significant increases in 
levels of recall and clustering have been observed when children are presented with all items 
from a given category blocked together, rather than distributed through the list (e.g. Cole et al. 
1971). Equally, at retrieval, the provision of category labels, especially if accompanied by a 
request to recall each category in turn, leads to a marked improvement in organization and recall 
(Scribner & Cole, 1972). Here again, however, there are indications in the literature that such 
procedures fail to be effective with children below six years of age. Jablonski (1974) reviews a 
number of sùch studies which failed to find effects for blocking with this age group and reports 
research of his own which was unable to produce similar effects for cueing. These results lead 
him to speculate that five year olds may suffer from something more akin to true mediational 
deficiency. 

The failure of such aids to facilitate organization and recall may indeed reflect some ee 2 
qualitative difference in the performance of very young children relative to their older peers. ae) may 
Alternatively, there may be certain features of the traditional free recall paradigm which are S po 
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inimical to the production of more mature cognitive strategies by this age group. There are 
indications in the literature that quite small changes in procedure can produce major alterations 
in the performance characteristics of young children. Kobasigawa, for instance, was unable to 
find any influence upon performance through blocking items in kindergarten children when there 
were six items per category (Kobasigawa & Middleton, 1972) but did find a significant effect 
when only four items were represented (Kobasigawa & Orr, 1973). Again, Furth & Milgram 
(1973) only found significant effects for locking in six year olds when all examples of a given 
category were presented simultaneously and the subject actively labelled the items. Finally, 
virtually all the studies have used drawings of the objects as stimuli. There are indications from 
the concept-formation literature that young children may benefit dispropriately from the 
provision of actual objects rather than pictorial stimuli (Sigel, 1968). If all these facilitating 
features were incorporated in one study, perhaps even five year old children might show benefits 
to recall and clustering associated with the provision of organizational aids. 

The experiment assessed the free recall performance of four and five year old children for a 
series of objects drawn from five common categories. Performance was examined under four 
conditions which were designed to vary the provision of organizational aids. Condition (i) 
followed the orthodox procedure of presenting items in a random order and asking the subject 
for unaided recall (unblocked-uncued). Performance of subjects in this control condition was 
compared with subjects who (ii) received facilitation at encoding the stimuli by presenting all 
items from a given category together (blocked-uncued), (iii) saw the objects in a random order 
but received assistance at decoding by being fed with the category labels at the time of recall 
(unblocked-cued) and (iv) received assistance at both encoding and decoding (blocked-cued). 

The following predictions were made. First, that if children of this age do suffer from 
production rather than mediational deficiency, then the provision of aids to organization at 
encoding and decoding will increase the levels of recall. Further, the degree of facilitation due to 
encoding and decoding aids should reflect the relative deficiency of the cognitive processes 
involved. Finally, if such aids genuinely facilitate mnemonic organization, then any increase in 
recall should be accompanied by a corresponding upsurge in the relationship between recall and 
clustering measures, both between and within groups. 


Method 
Subjects and design 


The subjects for this study were 20 girls and 28 boys attending the reception class of a middle class suburban 
primary school. Their ages ranged from 4 years 7 months to 5 years 7 months with an average of 5 years 1 
month. They were allocated to one of four groups primarily on the basis of age, although sex was 
approximately balanced across groups. These groups corresponded to the four conditions (i) 
unblocked-uncued (UU), (ii) blocked-uncued (BU), (iii) unblocked-cued (UC) and (iv) blocked-cued (BC) 
which formed the cells of a 2x2 factorial design. 


Materials 

The materials consisted of 20 objects, carefully selected so as to be easily labelled and familiar to children of 
this age. They represented four instances of five different concepts. The concepts chosen and the items 
represented were: ‘things on wheels’ (car, bus, lorry, tractor); ‘animals’ (cow, pig, horse, sheep); ‘fruit’ 
(apple, banana, pear, orange); ‘things we use when eating’ (plate, knife, cup, spoon); ‘clothes’ (vest, sock, 
bonnet, glove). 


Procedure 


The female experimenter moved into the classroom as an unpaid auxilliary for a week prior to the study to 
ensure reasonable rapport between herself and her young subjects. For the actual testing each child was 
taken in turn into an adjoining room and shown a table on which five closed boxes had been placed. They 
were told that they were to play a ‘memory game’ which involved remembering the contents of the boxes. 
They would be shown the contents of each box in turn and would be required to say aloud the names of the 
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items inside as each was pointed to by the experimenter. The experimenter then took the child through each 
box in turn, replacing the lid after all the items within it had been labelled. Any idiosyncratic labelling was 
corrected by the experimenter. Presentation rate was determined by the child but the task generally took 2 
min to complete. In the unblocked conditions (UU and UC), each box contained one item from each of four 
different categories. In the blocked conditions (BU and BC), all the items from a given category appeared 
together in one box. The nature of the groupings was not explained to the subjects in either condition. 

After all the boxes had been examined, the child was asked to recall the objects. In the uncued conditions 
(UU and BU), the experimenter simply asked the subject to recall as many items as he could, the trial being 
terminated when no new response had been elicited for 30 sec. In the cued conditions (UC and BC), the 
experimenter began by naming a category and asking the subject to tell him as many of the items he had 
seen from the category as he could remember. He was taken through each of the five categories in turn, 
moving on to the next when no response was forthcoming after 10 sec. The trial was terminated when all five 
categories had been sampled. The order of recalling the categories was kept constant across subjects and 
was not the same as the order of observing the relevant boxes. Occasionally, a child might give a response 
which was inappropriate for the category currently sampled. In this case, it was recorded by the 
experimenter without comment. All conditions received two cycles of presentation and recall, the procedure 
being identical on the two trials, save that the order of observing the boxes and the items within them always 
differed from the initial trial. Subjects’ responses were transcribed by the experimenter on to a record sheet 
for future analysis. At the conclusion of the experiment, half the subjects in each condition were given all 
the objects and were asked to select the representatives of each category as it was named by the 
experimenter. This procedure served as a check upon the availability to subjects of the categories 
represented in the array. 


Results 


The children’s response protocols were scored in terms of items recalled correctly and amount 
of clustering. Considerable controversy exists over the appropriate measure for assessing the 
degree of clustering (e.g. Frankel & Cole, 1971). As one of the major aims of the present study 
was to produce data comparable to the main body of the literature, the traditional measure of 
observed minus expected clustering was employed (Bousfield & Bousfield, 1966). This measure 
has been criticized on a number of grounds, in particular that it is significantly correlated with 
absolute level of recall (Herriot ef al. 1973). With these and similar criticisms in mind, all 
clustering data were recomputed using the modified version of the Bousfield index suggested by 
Hudson & Dunn (1966) which, among other advantages, is less strongly recall dependent. As the 
latter formulation yielded comparable results, the authors felt justified in presenting their data in 
terms of the traditional Bousfield measure. Repetitions and intrusions were ignored in the 
computation of both sets of data. 

The mean number of items recalled by subjects in each of the four conditions on the two test 
trials are shown in Table 1. These data were subjected to a 2x22 analysis of variance which 
had blocks, cues and trials as main effects. Significant effects were found for blocking (F = 29-9, 
d.f. = 1, 44, P< 0-001), cueing (F= 31-03. d.f. = 1, 44, P< 0-001) and trials (F= 11-25, d.f. = 1, 44, 
P<0-001). All effects were independent with no interactions approaching significance (all F< 1). 
There was some heterogeneity of variance in the recall data due to a ceiling effect operating in 
BC, but the sizes of the F ratios were such as to inspire confidence that the observed effects 
reflected real trends in the data. 

An analysis of variance of the clustering data again showed significant effects for blocking 
(F = 63-11, d.f.=1, 44, P< 0-001), cueing (F= 69-35, d.f. = 1, 44, P< 0-001) and trials (F = 8-85, 
d.f. = 1, 44, P< 0-01), unencumbered by any significant interaction. Thus the provision of cues 
to facilitate semantic organization at encoding and decoding have independent and highly 
significant effects on the recall and clustering of the young subjects employed in this study. 

The mean levels of clustering in the four conditions are also shown in Table 1. It will be 
observed that UU produces exceptionally low levels of clustering, which are just significantly 
greater than chance on trial 1 (t =3-09, d.f. = 11, P< 0-01) and trial 2 (¢= 2-25, d.f. =11, 
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Table 1. Performance of subjects in each condition as a function of trials (T, = first trial; 
T, = second trial) as measured by (a) items recalled and (b) levels of clustering. Each mean 
based on 12 cases 


Conditions 
Unblocked-uncued Blocked-uncued Unblocked-cued Blocked-cued 


Ty T: Tı T: T: T: T: T: 


(a) Mean items 
recalled 
Mean 8-25 9-58 12-58 13-75 12-25 14-25 17-33 18-33 
S.D. 2-74 4.07 3-20 3-44 3-29 3-11 1-54 1-74 
(b) Levels of 
clustering 
Mean 1-02 1-08 5-91 6-94 5-94 7:36 9-71 10-65 
S.D. 1-08 1-54 2-61 2-66 2:51 2:39 1-17 1-36 


Table 2. Within-groups correlations between number of items recalled and measured clustering 
in the four conditions as a function of trials (T, = first trial; T} = second trial) and conditions 
(Spearman’s rank correlation coefficient) 


Unblocked-uncued Blocked-uncued Unblocked-cued Blocked-cued 
Ti Ta T T: T, T: T: T: 
+0-58 +0-57 +0-89 +1-00 +0-96 +0-94 +0-85 +0-99 
P n.s. n.s. < 0-001 < 0-001 < 0-001 < 0-001 < 0-001 < 0-001 


P< 0-05). In contrast to the other conditions, the index shows little tendency to rise over trials. 
Further information on the relationship of clustering to recall can be gathered by examining the 
correlations between recall and clustering at the within-group level (see Table 2). While the 
relationships between the two measures are very highly significant for the BC, BU and UC 
conditions, the correlation for UU fails to rise above chance level. The hypothesis that children 
would be able to capitalize on the organizational structure of the list to facilitate their recall 
when this structure was made more apparent, is fully confirmed. 

The effects of blocking and cueing appear to be equivalent and additive; UU producing the 
worst performance, BC the best and BU and UC showing intermediate effects. However, 
although BU and UC are approximately the same in their levels of recall and clustering, the way 
in which that performance is achieved appear to be rather different. This is revealed by 
examining the scores of subjects in terms of the number of categories represented in their 
protocols and the average number of items recalled per category on each of the two trials. 

If subjects are partitioned into those who draw upon all five categories for their recall and 
those who use less, differing effects are observed due to blocking and cueing. On trial 1, 19 of 
the 24 subjects in the cued conditions recalled from all five categories whereas only three of the 
non-cued subjects accomplished this, a highly significant difference (x? = 10-08, d.f. =1, 

P< 0-005). A similar effect emerged on trial 2 where 21 of the subjects using cued recall sampled 
‘from all categories and only 11 were found in the non-cued conditions (x° = 4-10, d.f. = 1, 
P< 0-05). An analysis of the same data in terms of blocked and unblocked presentation showed 
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that this factor had no discernable effect on the number of categories sampled by subjects on 
either trial. 

An examination of the effects of blocking and cueing on items recalled per category reveals a 
contrary trend to that for category recall. A 2x2x2 analysis of variance indicated that while 
significant increases in item recall arose from blocking (F = 81-97, d.f. = 1, 44, P< 0-001), cueing 
(F = 20-29, d.f. = 1, 44, P< 0-001) and trials (F = 16-24, d.f. = 1, 44, P< 0-001), blocking 
produced the major effect. This is supported by scrutiny of the proportion of variance 
attributable to each of the factors (see Kirk, 1968, p. 134). Blocking accounts for 48 per cent of 
the variance whereas cueing (12 per cent) and trials (2 per cent) have relatively small effects 
upon item recall. 

Thus, although BU and UC produce equivalent performance, the significance of category and 
within-category recall appears to differ in the two conditions, BU benefiting from greater 
within-category recall, UC gaining its major advantage through wider sampling of categories. 
When both factors are combined, as in BC, then the subject is able to approach his optimal 
performance on this type of material. 


Discussion 

In the present study, the recall and organization of five year old subjects showed large and 
significant increases through the provision of blocked presentation and cued recall. The selective 
facilitation at the level of category and within category recall shows many parallels with the 
findings from older subjects (see, for instance, Halperin, 1974). The success of the present study 
in obtaining these effects with such a young sample can probably be ascribed to the various 
methodological and procedural changes noted in the introduction. 

The data clearly support the view that children of five years of age suffer from production 
deficiency rather than any form of mediational deficiency. Given the appropriate conditions, they 
can and do encode and retrieve materials in terms of semantic categories, and these categories 
serve to facilitate recall in much the same way as with adults. Obviously, it would be possible to 
designate concepts or category labels beyond the comprehension of a five year old (‘mammals’, 
‘alloys’, etc.) and on which the child would be technically mediationally deficient. The more 
limited range of concepts available to the younger child is not in dispute, what the data appear to 
demonstrate is that given the availability of the necessary structures, the young child can be 
induced to employ them in the same manner as his older peers. 

One of the most striking findings of the present experiment was the very high within-group 
correlations between levels of recall and clustering in the conditions where aids were provided. 
To the degree that the clustering and recall scores are dependent, these figures are somewhat 
inflated, nevertheless, they indicate a strong relationship between organization and recall. The 
tendency for blocked presentation to produce high correlations between clustering and recall has 
been noted by Furth & Milgram (1973). 

One inherent danger in all studies which seek to explore the possibilities of production 
deficiency is that the experimental situation will be so altered as to change radically the 
characteristics of the task. In this connection one source of possible artifact should be 
discounted. This is that cueing has an effect simply because the cues act as clues, inviting the 
child to generate a number of plausible responses, some of which are likely to be correct. In 
common with most other published studies of cueing (e.g. Tulving & Pearlstone, 1966), rates of 
overt intrusions were very low (an accumulation of 23 in 96 trials) and evenly distributed 
between cued and non-cued conditions. Thus an explanation of these results in terms of an 
alteration in task requirement seems unlikely. 

Given the obvious benefit to performance produced by encoding and decoding cues, the very 
low level of overall recall and lack of spontaneous clustering observed in the UU condition 
becomes all the more surprising. This failure cannot be attributed to lack of availability of the 
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concepts involved: all subjects who were tested for knowledge of the categories at the 
conclusion of the experiment accomplished the task successfully. Nor can their poor 
performance be attributed to any tendency to structure their recall in terms of the boxes in 
which the items appeared, rather than the categories from which they were drawn. Rescoring the 
UU clustering data in terms of boxes rather than categories failed to show any significant 
organization attributable to that factor. However, comparison of the results of the BU and UC 
conditions to that of UU does provide a number of clues as to why these unaided controls 
performed so badly. 

Subjects in UC are encoding the information ın exactly the same circumstances as those in 
UU, yet their performance is much superior. This indicates that the poor performance of the 
unaided controls cannot be attributed to idiosyncratic or immature categorizing of the stimuli as 
Jablonski (1974) and others have suggested. Given that cues for retrieval must be encoded at 
presentation to be effective (Tulving & Thomson, 1973), both groups must have had recourse to 
appropriate category information at the time the materials were first exposed. As we have noted, 
the superiority of UC over UU derives mainly from the extra categories sampled by the former 
group. However, there is an effect over and above this, indicating that cueing also increases the 
number of items recalled from a given category. In other words, subjects in UU not only recall 
fewer categories, they also retrieve the material associated with them less efficiently and 
exhaustively (cf. Kobasigawa & Orr, 1973). This is also underlined by the rate of repetitions in 
the cued and uncued groups. Subjects in the uncued conditions amassed a total of 55 repetitions 
between them (UU = 35; BU = 20) while those in the cued groups managed only 13 (UC =7; 
BC = 6) suggesting that the subjects’ unaided retrieval plans tend to be rather haphazard and 
redundant. 

However, retrieval inefficiency is not the only difficulty facing subjects in UU, as a 
comparison with those in BU makes clear. As has been noted, blocking affects the number of 
items retrieved per category, suggesting that contiguous presentation ensures that like items are 
in some sense stored together and thus retrieved together. The spatial array might ensure that 
the items formed a rehearsal set as Rundus (1971) has suggested, or perhaps more appropriately 
for children of this age, that they are stored in terms of an image which may be reintegrated at 
recall (Horowitz, Lampel & Takanishi, 1969). 

In conclusion, these results suggest that given appropriate conditions, five year olds can and 
will show a benefit to their levels of recall from the provision of aids which emphasize the 
semantic relatedness of the stimuli. However, in the absence of such aids, children of this age 
provide little evidence of planful memory activity. In the present study, this inadequacy reflected 
incompetent memory skills, both at the encoding and retrieval stages, rather than a fundamental 
ignorance of the categories involved in the task. 
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Tests of a structural theory of the memory trace* 


Gregory V. Jones 


Jones (1976) has shown that the memory trace resulting from the viewing of a picture corresponds to a 
‘fragment’ of that picture. The present paper shows that the fragmentation hypothesis also correctly 
represents the recall of memories derived from sentences. The paper is in four sections. First, an experiment 
is reported which confirms that the earlier results of Jones (1976) cannot themselves be attributed to verbal 
recoding of the visual stimuli used there. Second, the re-analysis of four large experiments of Anderson & 
Bower (1973) investigating the cued recall of sentences shows that sentences (both imaged and non-imaged) 
also give rise to fragments in memory. Third, the fragmentation hypothesis is shown to be consistent with 
the results of several other experiments investigating the multiple cuing of sentences, even though the 
storage of configural information is not assumed. Fourth, the status of the fragmentation hypothesis as a 
general theory of memory is discussed in the light of a distinction drawn between structural and 
compositional representations of memory. 


It is difficult to deny that memory is a connective process: people’s names spring to mind when 
one sees them or thinks of them, rather than at random. Thus if an event which has been 
committed to memory is represented simply by the two components, a and b, one clear goal of 
memory research should be to characterize the circumstances in which provision of component a 
leads to recall of component b. However, this goal is a misleading one if adhered to strictly, 
since it does not allow full examination of the structure of the memory process, as revealed by 
the interrelationships in recall between several different aspects of a single event. 

Investigation of the structure of the memory process may be carried out in a direct way using 
the method of cued recall (see Tulving & Bower, 1974), in which one or more components of a 
situation are provided as cue for the recall of its other components. The research to be discussed 
here employed what may be termed orthogonal cued recall, so called because the material to be 
remembered is composed of several components which are each independent. The results of 
these experiments show that pictures, sentences and visual images are mnemonically isomorphic: 
all three give rise to the same structural unit of memory, the fragment (Jones, 1976). 

A fragment is the stored trace corresponding to part of a perceived situation. It has the 
following surprisingly simple properties. Each fragment can be represented as a certain 
combination of the stimulus components which are under explicit investigation as cues and 
answers, though it will often contain information relating to other aspects of the situation as 
well. It is postulated that a fragment is rendered accessible to recall if and only if one of its 
components is used as a cue. Hence whereas the original perceptual selection of certain aspects 
of a situation but not others to form a fragment is a probabilistic (though consistent) process, it 
is postulated that recall of a fragment is all-or-none. The fragmentation hypothesis thus affords a 
detailed quantitative representation of the relative frequencies with which different types of both 
single and multiple cues should be able to induce recall of different components of the items to 
be remembered. 

It has previously been demonstrated (Jones, 1976) that the results of an experiment 
investigating the cued recall of pictures were in quantitative agreement with the fragmentation 
hypothesis. It is possible, however, that this result might not generalize to the processing of 
other types of material. In particular, the fragmentation hypothesis may not apply to memory for 
verbal material. This is investigated here by a detailed re-analysis of the results of four p 
large-scale experiments investigating memory for sentences which were reported by Anderson Po , 
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Bower (1973, chapter 10), and by qualitative consideration of the results of Anderson & Bower 
(1972), Foss & Harwood (1975) and Anderson (1976). 

First, however, another possibility must be investigated. This is that the recall investigated by 
Jones (1976) was itself verbal. Thus although precautions were taken in that experiment to 
encourage solely visual processing (e.g. answer booklets contained visual rather than verbal 
representations of items), it is possible that subjects simply recoded items verbally. For 
example, a picture with a yellow cup in the top left corner may have been encoded as the verbal 
string ‘yellow, cup, top left’. The implication of this possibility is that if the experiment of Jones 
(1976) were to be repeated with verbal instead of visual representation of pictorial material the 
level of recall should either be unchanged or else it should be facilitated due to the recoding 
stage now being unnecessary. Such a finding has been reported by Wells (1972). Unlike other 
investigators (see Paivio, 1971), Wells examined the recall of visual and verbal stimuli whose 
dimensionalities were strictly equated (e.g. subjects remembered either a large red dotted blob or 
else the words ‘large red dotted’). It was found that immediate recognition performance was 
equal in the two cases. Thus it may be that subjects encode explicitly dimensionalized pictures 
purely verbally, in which case the picture recall studied by Jones (1976) would not have 
benefited from the usual superiority of visual over verbal recall (see Paivio, 1971). 

To distinguish between these possibilities, an experiment was carried out whose design 
differed from that of Jones (1976) in three ways. First, subjects were shown words instead of 
pictures to remember. Second, answer booklets displayed cues and possible answers verbally 
instead of visually. Third, there was again a delay between presentation and recall. For half the 
subjects this was filled as before by their counting backwards in threes aloud (Peterson & 
Peterson, 1959). However, in case this activity selectively interfered with verbal memories, the 
other half carried out a visual card-sorting task instead. 


Experiment 

Method 

Subjects. The subjects of this experiment were 54 male students of Cambridge University who were each 
paid 50p for participating. 


Stimuli. Subjects were shown sets of nine slides to remember. Each slide gave a verbal description of an 
object: a colour (C) on the top line, an object-type (O) on the middle line, and a location (L) on the bottom 
line, for example YELLOW/CUP/TOP LEFT. Subjects also had to remember the sequential position (S) of each 
item within the set. 


Questions, After a delay, the recall of each set of items was tested. Subjects were given answer booklets, 
each page of which represented a particular item. Each page showed the nine possible alternatives for each of 
the four attributes’ the relevant words for C, O, and L, and the ordinal positions first to ninth for S. On each 
page, the particular value of either one, two, or three of an item’s four attributes had been marked by the 
experimenter. The subject had to attempt recall of the corresponding missing three, two, or one attributes. 


Procedure. Each subject was given a detailed set of instructions to read. The instructions fully informed the 
subjects of the nature of the task; in addition the experimenter paraphrased them informally. 

The slides were projected by a Kodak Carousel S-AV linked to an electronic timer. After a ‘get ready’ 
slide the timer controlled the automatic exposure of nine slides in succession. The cycle time was 2:2 sec 
(1-1 sec exposure, 1-1 sec dark interval). After the ninth slide, a tenth one appeared and was exposed for 25 
sec. This was either a three-figure number (e.g. 461) from which subjects counted aloud backward in threes, 
or else it instructed subjects to ‘sort cards’. The cards provided showed silhouettes of several different 
jigsaw-puzzle pieces, and could be distinguished verbally only with difficulty. 


-Results 

Fhe nature of the distracting activity in the delay interval did not significantly affect the level of 
performance in this experiment, which was much lower than in that of Jones (1976). These two 
points are revealed by examination of the total number of correct answers (regardless of their 
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types) made by each subject; the maximum possible score was 180 per subject. First, in the 
present experiment the average scores for counting and for card-sorting delays were 37-5 and 
38-3, respectively; a Kolmogorov-Smirnov two-sample test revealed no significant differences 
between the two distributions (Kp =5, n= 27). Second, when these two sets of data were 
merged, it was found that the resulting distribution of scores was highly significantly lower 

(Kp = 39, n= 54, P< 0-001) than that in the corresponding visual experiment of Jones (1976), the 
average scores being 37-9 and 66-7, respectively. 


Table 1. Overall percentage correct for each combination of cue-type and answer in the verbal 
experiment — impairments relative to pictorial results shown in parentheses 


Answer 
Cue type C L O S 
C 14-0 (18-8) 32 3 (9-4) 15-9 (3 9) 
L 17 3 (13-9) 18 1 (18-9) 17-9 (9-5) 
oO 30-3 (14-0) 15-4 (25-9) 15-2 (9-0) 
S 19 5 (11-6) 14-5 (27-4) 21-6 (15-1) 
CL 30 2 (17-0) 15-3 (9-1) 
co 10-3 (33-0) 16-9 (13-3) 
CS 17-8 (24-3) 34-4 (17-8) 
LO 35-6 (6-8) 22:8 (11-8) 
LS 19-6 (9-5) 18-2 (24-3) 
OS 39-7 (2-0) 18-2 (27-8) 
CLO 19-0 (20-1) 
CLS 29-4 (29-7) 
COS 16 3 (37-0) 
LOS 37-8 (13-7) 


Note: C = colour, L = location, O = object type, S = sequential position. 


A more detailed measure of the level of recall is that shown in Table 1: the overall percentage 
of correct answers for each combination of cue-type and answer-component. Each percentage is 
lower than the corresponding one in Table | of Jones (1976), with the difference shown here in 
parentheses. It can be seen that recall involving location was especially impaired, with an 
average decrement of 19-1 per cent (otherwise 10-5 per cent) when location was either cue or 
answer in single-cueing, and of 28-4 per cent (otherwise 12-4 per cent) and 37-0 per cent 
(otherwise 21-2 per cent) when it was double- and triple-cued respectively. This relative 
impairment is similar to that found by Pellegrino, Siegel & Dhawan (1975) for the retention of 
information concerning the location of presentation of drawings and their labels. Since only 
colour and object-type were recalled well above the chance level (approximately 1/9, i.e. 11-1 
per cent), the data of this experiment do not convey further information regarding the effect of 
multiple-cuing. 


Memory for sentences 


The orthogonal cued recall experiment of Jones (1976) established that the fragment was the 
structural unit of memories which, it has now been shown, did not result from verbal re-coding. 
It is possible to investigate memory for sentences in the same way. Instead of sets of pictures all 
composed of the same visual dimensions, sets of sentences all of the same pattern can be used Fe 
as material to be remembered. Four such experiments have been reported by Anderson & Box Men, 
(1973, chapter 10). Each experiment is treated separately below, and the fragmentation f Pii 
hypothesis compared with Anderson and Bower’s own model in each case. i a ( í pA 4 
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Table 2. Representation of the data of Expt1 of Anderson & Bower (1973) by the fragmentation 
hypothesis and by HAM 








Theoretical 
Fragment Sa E 
Event type Data Fragmentation HAM 
l L -AVO I 93 88-3 89-0 
2 L>AV 2 17 20-2 212 
3 L+AO 3 33 35-3 39-2 
4 LoA 6 41 i 36 8 35-1 
5 L-VO 4 33 28-3 32-6 
6 L=>V 7 23 21:3 23-4 
7 LO 8 37 34-0 39-4 
8 A>LVO I 87 88-3 88-8 
9 A>LV 2 20 20-2 23-0 
10 A->LO 3 36 35:3 42-4 
1] A>L 6 43 36 8 39-1 
12 A—>VO 5 36 25-0 26:3 
13 A>V 9 19 17:3 220 
14 A->0O 10 36 29:2 35-5 
15 V-»>LAO 1 78 88-3 74-2 
16 V>LA 2 18 20-2 ` 198 
17 V>LO 4 30 28:3 30-3 
18 VL 1 24 21-3 24-0 
19 V—AO 5 21 25-0 22-4 
20 VA 9 16 17:3 19-2 
21 V>0 11 30 30:8 34-9 
22 O->LAV 1 95 88 3 87:8 
23 O-LA 3 38 35:3 39-6 
24 0>LV 4 30 28-3 32-9 
25 O>L 8 41 34-0 40-7 
26 O-AV 5 22 25-0 24-7 
27 O-A 10 29 29:2 31-7 
28 Oo-V 1] 31 30-8 37-7 
29 L->A+, A>L+ 12 5 5-0 3-7 
30 V>0+, O79 V+ 12 5 5-0 3-7 
31 L~+:A>VO 5 6 83 ' 9-7 
32 L>:A3>V 9 6 5-8 4-9 
33 L-+:A>0O 10 ll 9-7 77 
34 L-:V->AO 5 4 8-3 8-1 
35 L>:V>5>A 9 5 5-8 43 
36 L+>.V>0O 11 11 10-3 77 
37 L-+:0>AV 5 lI 8-3 9-5 
38 L>:03A 10 8 9-7 73 
39 L>:0>V 11 11 10-3 8-4 
40 A>:L-+VO 4 6 9-4 11-9 
41 A>: L>V 7 4 71 4-8 
42 A>:L3>0O 8 4 11:3 8-2 
43 A->:V>LO 4 5 94 10-2 
44 A->:V>L a 7 71 4-7 
45 A~:V30 H 13 10-3 77 
46 A>.0-LV 4 9 9-4 11-8 
47 A>:0>5L 8 10 11:3 8-4 
48 A->.0>V 11 13 10-3 84 , 
49 V>:L+AO 3 12 11-8 18-6 
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Table 2 (cont.) 





Theoretical 
Fragment —_ 
Event type Data Fragmentation HAM 
50 V>: L>A 6 14 12-3 88 
51 V>.L>0 8 l1 11-3 10-9 
52 V—>:A—>LO 3 8 11-8 19-1 
53 V> A>L 6 13 12-3 9-9 
54 V>:A>0 10 7 9-7 100 
55 V>:0 LA 3 14 118 18-0 
56 V>:0>5L ' 8 14 11-3 10-9 
57 V=>:0>A 10 li 9-7 9-1 
58 O—>:L—>AV 2 13 67 8-7 
59 0>:L>A 6 8 123 6-9 
60 0>:L>V 7 4 7i 4.9 
61 O>:A>LV 2 5 6-7 8:8 
62 O>:A>L 6 10 123 ` 76 
63 0—>:A>V 9 3 5-8 5-0 
64 O0>:V>LA 2 8 6-7 7-4 
65 0>:V>L 7 9 7-1 4-7 
66 0>:VoA 9 4 5-8 4-3 
67 LA7>:V+0 11 7 10-3 70 
68 LA>:0>V 11 7 10-3 7-9 
69 LV>:A>0O 10 7 9.7 8-7 
70 LV>:0—-A 10 8 9-7 8-3 
71 LO>:A>V 9 11 5-8 4-6 
72 LO>:V+A 9 5 5-8 3-9 
73 AV7>:L0O 8 9 11-3 9-2 
74 AV >:0>L 8 10 11-3 9-2 
75 AO>:L-—>V 7 7 71 4-4 
76 AO>:V >L 7 gi Tl 3-9 
77 VO>.L>A 6 7 12-3 7-9 
78 VO>:A>L 6 11 12-3 8-4 
79 AVO>: 15 386 369-1 366-2 
80 LVO >: 15 368 369-1 370-2 
81 LAO >: 15 358 369-1 363-8 
82 LAV >: 15 365 369-1 367-2 


Note: L. = location, A = agent, V = verb, O = object. 


Anderson & Bower Expt 1 


Orthogonal cued recall of sentences of the type ‘In the Location the Agent Verb the Object’ (LAVO -for 
example, ‘In the park the hippie touched the debutante’) was investigated. There were two major differences 
from the method used by Jones (1976). 

First, instead of using a small number of values for each type of component, arranged orthogonally over 
sets of stimuli, Anderson & Bower employed each of a large number of alternatives once only, the 
combination of values in any particular sentence being randomly determined. This also ensured the 
independence of the different components. In addition, since the range of alternatives was very large, it may 
be assumed that no correct responses occurred purely by guessing, and this simplifies the analysis. 

Second, Anderson & Bower employed incremental cuings; each sentence was cued with one, then two, 
then three of its key words, instead of one or two or three. The incremental-cue gives even more 
information about each individual memory, although the independence in retrieval of single-, double- and 
triple-cued responses now has to be assumed. 
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Results 


Table 2 gives the number of observations falling in each of the 82 recall patterns reported by 
Anderson & Bower. The table is divided into five sections as follows: 

Lines 1 to 28: here the first cue evoked some recall, but later cues did not. The nomenclature 
may be exemplified by line 6: L—>V means that L evoked V, but not A or O. Each line is 
summed over six combinations of second and third cues. 

Lines 29 and 30: here the first cue evoked one word, and the second cue an additional one. 
Note that four other potential classes of data of this type (‘unpredicted’ second recalls) have 
been reallocated by Anderson & Bower, as though the extra word recalled at the second cuing 
had also been recalled at the first cuing. Each line is summed over 12 cue sequences: two 
different first-cue types, each summed over six combinations of second and third cues. 

Lines 31 to 66: here the first cue was unsuccessful, but the second cue evoked some recall. 
For example, A —>:L— V (line 41) means that cue A was unsuccessful, but then L evoked V 
(but not O). Each line is summed over the two third-cue types. 

Lines 67 to 78: here the first two cues were unsuccessful, but the third evoked recall. Each 
line has been summed over the two different orders of first and second cue: for example line 67 
(LA—: V->O) represents the cue sequences L-A-V and A-L-V. 

Lines 79 to 82: here all three cues were unsuccessful. Each line is summed over the six 
possible sequences of cue. 

Thus the theoretical maximum numbers of observations in the five classes were in the 
proportions 6:12:2:2:6, their actual values being 736, 1476, 246, 246 and 738 respectively. 


Fragmentation analysis. The observed patterns of recall are shown here to derive from the 15 
different types of fragment shown in Table 3. Types 12, 13 and 14 represent the occurrence of 
‘twin-fragments’: two independent fragments derived from the same sentence. A fragment 
corresponds to the particular combination of aspects of an event which are stored in memory at 
a particular time. Access to the fragment occurs if, and only if, one (or more) of its components 
corresponds to the cue (or cues) provided for its retrieval. One consequence is that the data 
shown in Table 2 should display specified sets of symmetry. As example, P(AVO|L), 
P(LVOJA), P(LAO|V) and P(LAV|O) should have equal values, each pattern occurring with the 
probability of a Type 1 (LAVO) fragment, Table 2 shows that these patterns did indeed have 
approximately equal frequencies of occurrence (93, 87, 78 and 95 respectively), the mean 
estimate of the LAVO fragment’s probability therefore being 0-1196. The validity of the 
hypothesis for the whole corpus of data may be tested in this way. 

If the data of Table 2 had been published in an exhaustive manner (that is, without the 
coalition of certain sequences of cuing in lines 29-30 and 67-82), the data of each cell could be 
classified both by its first-cue type and by its (hypothesized) underlying fragment type. Thus a 
15x4 table could be constructed to show the frequency distribution of the 15 types of fragment 
for each of the four first-cue types. The fragmentation hypothesis predicts that such a table 
should be statistically homogeneous. Since certain of the published data have been summed over 
two different first-cue types, it is not in practice possible to assign each datum to a unique 
combination of fragment type and first-cue type. It is, nevertheless, still possible to test the 
hypothesis in an equivalent manner by examining for each of the 15 fragment types in turn the 
homogeneity of the observed fragment frequencies for the four (or, just three) distinct first-cue 
classes. Similar reasoning applies to those data hypothesized to derive from the same type of 
fragment but differing only in their second cues. For example, all three of the patterns 
O7>:L—> AV, O —>: A > LV, and O >: V — LA (lines 58, 61 and 64, respectively) have the same 
first cue (O) and should derive from the same fragment (LAV). In all, it is possible to test one 
twofold, ten threefold, six fourfold, and six sixfold symmetries. The overall fit of the 
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Table 3. Percentage occurrence of each type of fragment in Expt 1 of Anderson & Bower (1973) 


Occurrence (%) 
Anderson & Bower's model? 
Fragment Experimental = =§£——————_______-_____ 
type estimate th= 1-3 th=3-9 
l LAVO 11-96 13-8 48-2 
2 LAV 2:74 21 2-4 
3 LAO 4:78 2-1 2-4 
4 LVO 3-83 2:1 2-4 
5 AVO 3-39 5-1 5.5 
6 LA 4-98 2:0 1-6 
7 LV 2:88 0-9 0:7 
8 LO 4-61 0-9 0-7 
9 AV 2-34 2-7 2-0 
10 AO 3 96 ' 2:7 2-0 
ll va 417 5-3 38 
12 LA, VO 0-34 1-0 1-1 
13 LV, AO —? — ~— 
14 LO, AV ab — — 
15 Null 50-02 59-4 27:2 
Total 100-00 100-0 100-0 


Note: L = location, A = agent, V = verb, O = object. 

a The values shown for Anderson & Bower's model constitute only an intermediate stage in that 
representation. 

è These two types of twin-fragment did occur ın the original data, as 14 instances of ‘unpredicted second 
recalls’ (Anderson & Bower, 1973, p. 297). However, since HAM predicts zero frequencies for these 
categories, Anderson & Bower reclassified them as though the two recalled words had both been correct on 
the first trial (see Anderson & Bower, 1973, p. 305). That 1s, the relevant observations were redistributed in 
the published data among patterns corresponding to fragment types 2, 3, 4 and 5. 


fragmentation hypothesis is extremely good, 2/ = 60-52, d.f. = 69, P>0-75. (Likelihood ratio 
analysis of contingency tables is employed; the resulting statistic, termed the Minimum 
Discrimination Information Statistic (2 by Kullback (1959, p. 85), is asymptotically chi-square 
distributed with an appropriate number of degrees of freedom. Since Anderson & Bower (1973) 
used the Pearson y? statistic (which in large samples approximates 2/) the value of this is also 
given on occasion to facilitate comparison of goodness-of-fit.) 

Exactly the same test could be carried out by another route. Maximum-likelihood estimates of 
the probability of occurrence of each type of fragment (shown in Table 3) can be derived by 
averaging, and used to generate the expected frequencies of the answer patterns (shown in Table 
2) for comparison with the observed values. 


Anderson & Bower's analysis. Anderson & Bower (1973) postulated that recall of a sentence is 
governed by the abstract propositional structure allotted to it in the acronymic computer 
program Human Associative Memory. HAM’s linguistic parser translates sentences into binary 
labelled trees, each stored wholly or in part as a corresponding collection of binary associations. 
This formulation gives rise to both qualitative and quantitative predictions. 

Anderson & Bower’s qualitative prediction concerned whether or not certain patterns of recall 
to first and to second cues could occur. According to HAM’s propositional representation, both 
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L and A are linked to a Fact (F) node, while both V and O are linked to a Predicate (P) node; 
the two clusters are themselves joined only by an FP link. Thus it can be seen that the initial 
recall of V alone when cued with L means that links LF and FP (and PV) must be intact, but PO 
and AF are broken (otherwise O and A would also be recalled). This means that O should not be 
capable of recall on a second cuing with A (or A with O). The fragmentation model, on the other 
hand, admits such an answer pattern; it corresponds to an (LV, AO) fragment — two independent 
traces formed from the same item. In the experiment (LV, AQ) and (LO, AV) patterns appeared 
almost as frequently as the geometrically permissible (LA, VO) patterns, providing evidence 
against HAM and in favour of the alternative fragmentation hypothesis. 

Anderson & Bower’s detailed quantitative predictions are based upon the assumption that the 
encoding in memory of the full propositional tree structure shown as their Fig. 10.1 (Anderson & 
' Bower, 1973, p. 284) is limited only by the time available. Consequently, at recall words close 
together in the tree will act as better cues for each other than for more distant words, since in 
the latter case there is a greater chance of a breakdown in the linking chain of associations. 
Anderson & Bower suggest a particular theoretical distribution of the times at which particular 
numbers of associations will be formed, which leads to an expression (with parameter tb) for the 
probability of encoding k out of the n possible associations. 

To this point Anderson & Bower’s model is a special case of the fragmentation hypothesis, 
since the associations are postulated to be both symmetrical and all-or-none. In fact, Anderson 
& Bower’s parsing model postulates a particular distribution of fragmentation parameters. Table 
3 shows this distribution for two different values of tb. Approximately the best available 
agreement with the fragmentation parameter estimates is obtained when tb is given the value of 
1-3. However, what agreement there is resides chiefly in the large proportions of both complete 
recall (LAVO fragment) and complete absence of recall (null fragment). This feature is not 
intrinsic to HAM: rather, it was simply a determining factor in the post hoc choice of the 
encoding probability distribution (see Anderson & Bower, 1973, pp. 287-288). One cannot 
therefore regard Anderson & Bower's model as a valid predictor of fragmentation probabilities. 

In practice, Anderson & Bower found it necessary to add a further level of complexity to their 
model. Eight additional free parameters were added to the model to represent probabilistic input 
and output between the four different types of word and their corresponding mnemonic 
concepts. This had the effect of destroying the symmetric and all-or-none properties which are 
primary characteristics of the fragmentation hypothesis, It also required that the amount of 
material postulated to be stored (but not necessarily recalled) should be increased considerably. 
Anderson & Bower found that in these circumstances the approximately optimal value of tb was 
now 3-9; the corresponding distribution of fragments would be that shown in the final column of 
Table 3. Using a mixture of minimum-,* and maximum-likelihood estimates of these nine free 
parameters, Anderson & Bower found their model provided a reasonable fit for the 82 classes of 
data reproduced in Table 2 (x? = 79-44, d.f. =72). 


Comparison of the two analyses. It will be recalled that for the fragmentation hypothesis 2f was 
60:52 with 69 d.f.; to ensure comparability, y? was also evaluated, and found to have the value 
58-87. Thus comparing the two models, a loss of only 3 d.f. improved y* for the comparison of 
data and model by 20-57. This is almost equivalent to a direct test between the two models, with 
a highly significant result in favour of the fragmentation hypothesis. Such a comparison of two 
models is in fact valid only when one is a specific subset of the other. In the present experiment 
this is not strictly true. But Anderson & Bower’s model is related to a particular form of the 
fragmentation hypothesis, deviating only in its assumption of stochastic input and output, so that 
such a comparison is probably not misleading. 

Finally, Anderson & Bower noted that although their overall fit was reasonable, there was a 
local discrepancy between the data and their model’s predictions of the effect of double-cuing 
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relative to that of single-cuing. This showed up as the prediction of too many total recalls (in 
lines 31, 34, ..., 64) and too few single recalls (32, 33, 35, . . ., 66) to second cues: these were 
expected to be 141-8 and 175-5 instead of the actual 101 and 211 respectively. Examination of 
Table 2 shows that the expected values according to the fragmentation hypothesis are, on the 
other hand, quite accurate, summing to 108-6 and 226-0 respectively. 


Anderson & Bower Expt 2 


In this experiment Anderson & Bower studied memory for sentences containing five instead of 
four key-words. The sentences were of the type ‘The Subject Transitive-verb the Object who 
Intransitive-verb in the Location’ (STOIL -for example, ‘The hippie touched the debutante who 
sang in the park’). As before, ‘subjects were provided with one, two and finally three key-words 
as cues for the remainder at recall. 

There were many more potential patterns of recall than in Expt 1. Even after Anderson & 
Bower had extended the classification of the data from 82 types (Table 2) to 135 types (Anderson 
& Bower, 1973, Table 10.8) in the present experiment, the published data are still compressed 
relative to those of Expt 1. The results of quantitative modelling of these data are as follows. 


Table 4. Percentage occurrence of each type of fragment in Expts 2 and 3 of Anderson & Bower 
(1973) : 


Fragment type Expt 2 Expt 3 

I STOIL 9 87 17-81 

2 STOI 1-43 1-01 

3 STOL 2-41 3-31 

4 STIL 1-73 155 

5 SOIL 3-37 5-63 

6 TOIL 1-34 1-53 

7 STO 1-61 2-45 

8 STI 0-10 0-42 

9 STL 1-20 0-97 
10 SOI 1-23 0-61 
11 SOL 2-78 3-35 
12 SIL 181 1-37 
13 TOI 0-50 0:36 
14 TOL 1-08 0-48 
15 TIL 0-84 0-97 
16 OIL 3-59 5-34 
17 ST 3-68 4-75 a : 
18 SO 322 2:90 Pá ETD 
19 SI 0-64 0-68 ee T 
20 SL 2:35 3-14 ae ee 
21 TO 1-43 1-26 , = 
2 TI 0-17 0-39 N o4 
23 TL 0-78 0-86 ree 
24 Ol 1-81 2:24 en ae 
25 OL 5-02 4-17 
26 IL 4-47 3-22 
27 Null 41 56 29-23 


Total 100-00 100-00 


Note: S = subject, T = transitive verb, O = object, I = intransitive verb, L = location. 
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Fragmentation analysis. The 27 types of fragment shown in Table 4 were relevant in this 
experiment. Twin-fragments have not been included since the categories of data which they 
describe (additional recall subsequent to successful first recall) have not been given separately in — 
the published data. 

Since incremental cuing in this experiment was not exhaustive (that is, three of the five 
key-words of a sentence were used, not four).the fragment type is not always identifiable (cf. 
Greeno & Steiner, 1964) - certain classes of data may have arisen from more than one type of 
fragment. (The experimental data of Anderson & Bower's Expts 2, 3 and 4 and their theoretical 
counterparts are not reproduced here for reasons of space, but are given by Jones (1974). The 
data of Expts 3 and 4, which were referred to by Anderson & Bower (1973), were kindly supplied 
by the authors.) For example, if the cuing of a sentence by S, by T, and finally by O was 
unsuccessful in each case (Anderson & Bower, 1973, Table 10.8, line 135), the relevant fragment 
could be either type 26 or type 27 (IL or null). 

Maximum-likelihood estimates (shown in Table 4) of the probabilities of occurrence of the 
different fragments were therefore obtained, using a penalty function iteration (see Lootsma, 
1972), as those which minimized 2/ for the overall comparison of data and theory. This 
comparison indicated very satisfactory agreement (2/ = 109-82, d.f. = (134-26) = 108). 


| 


Anderson & Bower’s analysis. Their model was of the same type as that used in their analysis of 
the previous experiment. Two additional free parameters were necessary since there were now 
five key-word types instead of four. For this experiment they found a significant discrepancy 
between data and model: x? = 164-88, d.f. = (134-11) = 123, P< 0-01. r 


Comparison of the two analyses. It is apparent that the fragmentation hypothesis has again 
proved the more accurate, as it did in Anderson & Bower's first experiment. Anderson & Bower 
located the failure of their model in its underprediction of the probabilities of recall of T given 
S, S given T, IL given O, OL given I and OI given L, and sought to explain this in terms of 
subjects adopting a focusing strategy for ST and OIL. Their linguistic-based model would 
perhaps retain plausibility if these groupings corresponded to the two propositions that are 
postulated to underlie the full sentence: one of these is indeed OIL, but unfortunately the other 
is not ST, but STO. 

What has happened is that Anderson & Bower have proposed an explanation for the structure 
of the recall of sentences which is based upon a particular description of the linguistic structure 
of sentences (that is, HAM’s parser). But it is apparent in the present experiment that the 
structure of recall is not determined by this particular linguistic structure. Although it is possible 
that a different parser might produce more satisfactory predictions, the result does suggest that 
the storage of sentences in memory may not be governed by their linguistic structure per se. 

The satisfactory fit of the fragmentation hypothesis, on the other hand, suggests that it 
characterizes correctly the fundamental nature of the sentence engram. At the level of analysis 
described so far, it is a matter of indifference to the hypothesis how the observed distributions 
of different fragment types have arisen {that is, whether by attentional strategy, linguistic 
analysis, contiguity or whatever). However, the investigation of the relative frequencies of 
formation of different types of fragment promises to be a fruitful field in the future because the 
experimental observations appear to relate so directly to the units of mnemonic storage. 

The analysis of the third of Anderson & Bower’s experiments, in which subjects employed 
imagery, is a step in this direction. It is shown that comparison of the distribution of 
fragmentation parameters in Expts 2 and 3 leads to conclusions concerning the effect of imagery 
which are almost the opposite of Anderson & Bower’s. 
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Anderson & Bower Expt 3 


This experiment was identical to the previous one, except subjects ‘received instructions 
strongly urging them to form vivid visual images of the sentences to be memorized’ (Anderson 
& Bower, 1973, p. 314). 

The outcome of quantitative modelling of the data is the same as before. Maximum-likelihood 
estimates of the fragmentation parameters in this experiment are shown in the final column of 
Table 4. These yield theoretical values for the numbers falling in the different data classes Which 
do not differ significantly from the experimental values (2/= 122-07, d.f. = 108), Anderson & 
Bower’s model, on the other hand, misrepresents the data to a highly significant extent 
(x2 = 267-40, d.f. = 123). 


Imagery: Anderson & Bower Expts 2 and 3 compared. Experiment 3 differed from Expt 2 only in 
that subjects were instructed to use visual imagery. The effects of this imagery instruction can 
be assessed by comparing the estimated values of the free parameters in the two experiments. 
These free parameters can be those specified either by the fragmentation hypothesis or by 
Anderson & Bower’s model. It is shown that the interpretation of the effects yielded by a 
fragmentation analysis is the opposite of that arrived at by Anderson & Bower. 


Fragmentation analysis. Statistical comparison of the fragmentation parameters given in Table 4 
shows that the probabilities of occurrence of certain fragments (types 1 and 5) were significantly 
increased by imagery instructions at the expense of the null fragment (type 27). 

Firstly, an arcsine transformation (= 2sin~' (P/100)#) was applied to each of the percentages 
shown in Table 4. Secondly, for each of the 27 fragment types the square of the difference 
between the two transformed values was calculated. The mean square provides a measure of the 
deviation between the two sets of experiments. Thirdly, an onion-peeling procedure was used to 
determine which differences were significantly large. That is, the largest squared difference was 
compared with the average of the sum of squares of the remainder; then the next largest; and so 
on until the current square did not differ significantly from the remainder. The method is thus 
analogous to the partitioning of an ANOVA (see Kullback, 1959, pp. 214-219). Adopting the 0-05 
level as the criterion of significance, the comparisons for fragment types 27, 1 and 5 were 
significant, with F = 14-78, d.f. = 1,25; F= 21-80, d.f. = 1,24; and F= 5-87, d.f. = 1,23, 
respectively; but that for the next highest difference (type 16) was not, F= 3-98, d.f. = 1,22. 

Types 1, 5 and 16, which increased most in likelihood of occurrence, corresponded to 
fragments STOIL, SOIL and OIL respectively. Thus the grouping ‘. . .the debutante who sang in 
the park’ (OIL) seems to be particularly enhanced by imagery instructions. This may be because 
the coding of information set in a locational context is particularly improved by the imaging 
operation; this explanation is consistent with the finding reported earlier that recall involving 
location is particularly improved by pictorial rather than verbal presentation. 

Anderson & Bower’s analysis. Anderson & Bower derived a rate parameter from each of the 11 
free parameters that they used in Expts 2 and 3, and noted that these increased uniformly by a 
factor of approximately 1-4. They interpreted this as demonstrating that the effect of imagery on 
memory for sentences is simply to raise the overall level of recall, rather than to alter the 
relative amounts of total or partial recall. This change in the pattern of recall, however, is 


precisely what the preceding fragmentation analysis showed to be really occurring. 


Anderson & Bower Expt 4 


In this experiment Anderson & Bower attempted to pit the hypothesized effects of a sentence’s 
parsed structure against the possible effects of its overt surface structure. This was necessary 


362 Gregory V. Jones : 


since their predictions in the previous experiments were confounded to a large extent with those 
to be expected on the basis of physical contiguity alone. The sentences which were used were of 
the somewhat convoluted form ‘In the Location the Object during the Time was Relationed by 
the Subject’ (LOTRS - for example, ‘In the park the debutante during the night was touched by 
the hippie’). 


Fragmentation analysis. This experiment is of interest as the only one of the four to yield data 
which were discrepant with the fragmentation hypothesis. 

This discrepancy can be readily demonstrated by examining the overall single-cue probabilities 
-the average probabilities of correctly recalling words to first cues regardless of the precise 
patterns of recall (Anderson & Bower, 1973, Table 10.124). These probabilities are postulated to 
be reflexive, P(b|a) = P(a|b), since recall of b given a will only occur if the relevant fragment 
contains both a and b, in which case cuing with b should similarly produce a. In the present 
experiment, however, there was overall a marked asymmetry (2/ = 63-13, d.f. = 10). This was 
because of the four relationships involving Time, T, which were significantly asymmetric overall 
(2f = 54-57, d.f. = 4) and also individually, recall being significantly better in each case when 
Time was being cued for than when it was used as cue. The six relationships not involving Time, 
on the other hand; did not deviate significantly from reflexivity 2f=8-56, d.f. = 6). 

The extent to which these other four types of key-word conformed to the fragmentation 
hypothesis was examined by modelling the experimental data after they had been condensed 
over the Time component: only the single-cued data had been published in sufficient detail for 
this to be possible. Table 6 shows maximum-likelihood values for the 12 different types of 
fragment relevant. Twin-fragments are not included since they are redundant if only single-cued 
data are under consideration; here the effect of an increase of x per cent in (ab, cd) fragments is 
exactly equivalent to that of an increase of x per cent in each of ab and cd fragments together 
with a decrease of 2x per cent in null fragments. Substitution of the values of Table 5 led to a 
set of theoretical values whose discrepancy with the observed numbers in ea¢h class of data 
only just reached significance (2/ = 28-87, d.f. = (28-11) =17, P< 0-05). It s ms, therefore that 
the aberration from the fragmentation hypothesis was indeed localized to the Time word, since 


Table 5. Percentage occurrence of each type of fragment (after deletion of T, Time-words) in 
Expt 4 of Anderson & Bower (1973) 


Fragment Occurrence (%) 
1 LORS 15-79 
2 LOR 3-19 
3 LOS 5.51 
4 LRS 3-64 
5 ORS 2-83 
6 LO 10-97 
7 LR 2 07 i 
8 LS 4-03 
9 OR 2:79 
10 Os 2-88 
ll RS 4:97 
12 Null 41-31 
Total 100-00 


Note: L = location, O = object, R = relation, S = subject. 
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even after extensive post hoc condensation and abbreviation the fit of the remainder only just 
failed. 

The anomalous behaviour of Time words bears a superficial resemblance to a result obtained 
by Jones (1976). In that experiment recall involving sequential position (perfectly correlated with 
temporal position) was markedly asymmetric also. However, in the present case the direction of 
the effect has been reversed: previously recall was superior when sequential position was used 
as cue rather than cued for. 

The most likely explanation of the Time-word asymmetry seems to be that it is an interference 
phenomenon. Anderson & Bower used 60 different examples of each of the other four types of 
words for making up the three lists of 20 sentences each. Due to their lexical scarcity, however, 
they had to use the same 20 Time words in each of the three sets. Thus the apparent inefficiency 
of Time words as cues may have arisen because they sometimes mistakenly accessed fragments 
derived from the sentences of previous lists. 


Anderson & Bower's analysis. Examination of the experiment’s qualitative results implied to 
Anderson & Bower that contrary to their prediction, physical contiguity appeared to provide a 
better basis than HAM’s propositional structure for the description of their data. Accordingly 
they suggested an alternative parsed structure (incorporating the passive relationship directly) 
which was similar to the surface structure of the sentence. Nevertheless, the discrepancy 
between model and data was still highly significant (y? = 204-05, d.f. = 123). 


Configural information in memory 


In addition to the experiments analysed in detail above, there are two other series of recent 
experimental results which the fragmentation hypothesis should be able to account for if it is to 
be applied to sentence recall. Both series were originated by Anderson & Bower (1972; see also 
1973, chapter 11), and have been addressed to the question of whether sentences are stored in a 
purely associative way or whether configural information is also stored. If, as a Gestalt theory 
would predict, configural or emergent information is indeed stored, then the level of recall with 
two cues for the same information should be greater than that to be expected if the two cues 
operated non-interactively. This prediction has been tested in two ways, using cross-over cuing 
as well as straightforward cuing for the recall of the Object term of simple SVO 
(subject-verb—object) sentences such as ‘The child hit the landlord’. 

It is shown here that the two series of findings are consistent with the fragmentation 
hypothesis version of the general associative model (though inconsistent with Anderson & 
Bower’s version), and thus that the existence of configural sentential information has yet to be 
established. 


Cross-over cuing. Anderson & Bower (1972) showed subjects pairs of SVO sentences which 
shared a common object, for example: 

(1) The child hit the landlord. 

(2) The minister praised the landlord. 
Subsequently, recall of O was cued by either S alone, V alone, S and V from the Same sentence 
(S,V,), or S and V Crossed-over from different sentences (S, V2). As shown below, Anderson & 
Bower’s associative model predicted that the level of recall produced by the Crossed-over cue 
should be higher than that produced by the Same cue, while the fragmentation hypothesis more 
accurately predicts that in differing circumstances either the Crossed-over or the Same cue can 
be the more effective. 

Anderson & Bower’s model, as before, assumes that recall is probabilistically determined by 

propositional structure. The structure derived from an SVO sentence has three segments, from S 
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to the predicate node (P, from V to P, and from P to O: their probabilities of being intact at 
recall may be labelled a, b, and c, respectively. Assuming that these probabilities are 
independent, the probabilities of recall to Same and Crossed-over cues are given by: 


P(O|S, V) = ac+be—abc : (1) 

P(O|S, Va) = ac+bc-ac. be, (2) 
respectively. Hence it is predicted that the Crossed-over cue is the more effective, with 

P(O|S,; V2)— P(O|S, V;) = abe(1—c). (3) 


According to the fragmentation hypothesis, on the other hand, the probabilities of formation of 
three different types of fragment are relevant. If L, M, and N represent the probabilities of 
formation of fragments SVO, SO, and VO, respectively, the Same and Crossed-over 
probabilities are 


P(O|S, V) = L+M+N (4) 


P(O|S; V2) = (L+M)+(L+N)—-(L+M)(L+N) (5) 
= L+M+N+L[1-(L+M+N+MN/L)}. 
Hence 
P(O|S, V,)—P(O|S, V) = L[l-(L+M+N+ MN/L)]. (6) 


Thus it can be seen that the fragmentation hypothesis predicts that Crossed-over cues can be 
either more effective or less effective than Same cues, depending on whether the expression 
(L+M+N+MN/L) has a value less than or greater than one, respectively. Hence one situation 
in which it is predicted that the Same cue is the more effective is that in which the probability of 
formation of an SVO fragment (i.e. L) approaches zero (i.e. subjects find sentences too difficult 
to remember any in their entirety), with in this case equation (6) taking the value - MN. This 
prediction is in accord with experiment, since whereas Anderson & Bower (1972) found 
Crossed-over cues to be the more effective, Foss & Harwood (1975), studying relatively low 
levels of recall, found Same cues to be superior. 


Straightforward cuing. Direct comparison may also be made of the effectiveness of the S, Y, and 
SV (i.e. S, V,) cues for recall, uncomplicated by any consideration of the presence or otherwise 
of other sentences sharing the same object. 

In this case the prediction from Anderson & Bower’s model follows from equation (1), that is, 


P(O|SV) = ac+ be- abc. 
Now P(O|S) = ac, and P(O|V) = bc. Hence 
P(O|SV) = P(O|S)+ P(O|V)—P(O,S)P(O|V)/c, 


i.e. P(O|SV) < P(O[S)+P(O|V)—P(O|S) P(O|V). (7) 


Assuming the fragmentation hypothesis, however, the upper bound of double-cued recall is 
equal to the sum of the corresponding single-cued probabilities (this boundary represents the 
failure to encode any SVO fragments). That is, 


P(O|SV) < P(O|S)+P(O!V). (8) 


Ten sets of data relevant to this issue were examined, and proved to be consistent with the 
fragmentation hypothesis. The results were in agreement with equation (8) for seven of the sets 
(Anderson, 1976, chapter 10, Expts 2 and 3; Anderson & Bower, 1972, Expts 1, 2, 3, 4a, and 


Tests of a structural theory of the memory trace 365 


4b). Three results were in numerical disagreement with equation (8), but in each case not 
significantly so (Anderson, 1976, chapter 10, Expt 1; Foss & Harwood, 1975, Expts | and 2). 
Anderson & Bower’s equation (7), on the other hand, was contradicted in all cases except two 
(Anderson & Bower, 1972, Expts 1 and 2). 

The fragmentation hypothesis thus provides a successful account of both the cross-over and 
straightforward cuing of SVO sentences. The theoretical parsimony of a purely associative 
model (e.g. Anderson & Bower, 1973; Jones, 1976) need not be abandoned in favour of assuming 
the existence of configural sentential information accessible only to two or more cues for the 
same sentence (e.g. Foss & Harwood, 1975; Anderson, 1976). 

Discussion 

The results reviewed here have provided substantial evidence that for sentences, as for pictures, 
the functional unit of memory, the mnemonic trace, is a fragment of the original item. A 
fragment represents a part or parts of the original situation stored in memory. Using orthogonal 
cued recall it is only possible to ascertain whether or not the fragment contains components 
corresponding to those used either as cue or as answer; it might also, of course, contain 
incidental features such as the setting in which an item was presented. Recall of a fragment 
occurs when elements corresponding to one or more of its components are given as cue. In this 
sense, recall operates in an all-or-none manner. That is, if the cue is a constituent of the 
fragment then all the other components are available each time it is used. If it is not, no access 
is possible. Memory is not predicted to be all-or-none, however, in the more usual sense that a 
particular event is recalled either in its entirety or not at all. On the contrary, this should only be 
observed in the unusual case in which each memory fragment contains either all the information 
to be remembered or none of it. 

It should also be stated that although the fragmentation hypothesis appears to accurately 
describe the recall process in a wide range of experiments, it naturally does not do so in all. 
Perhaps the most important class of exceptions occurs when those elements of a situation to be 
remembered which are specified by the experimenter do not match perfectly those elements 
which the task itself requires the subject to attend to. For example, consider the case in which a 
subject has to learn a CVC-CVC paired associate. The experimenter may specify this as a 
two-component A-B task. However, the subject may attend to each letter separately, that is, it 
may be effectively an A, A; A;-B, B2B, task. If the paired associate is learnt in the usual way, 
the subject needs only to be able to distinguish the stimulus term from other stimulus terms, and 
thus may attend only to, perhaps, the first letter. Any resulting A,B,B,B; fragment would be 
adequate for responding with the B term when cued with the A term. If, however, the subject is 
cued with the normal response term, B, then the associated term is merely A,, and correct recall 
of A does not occur, Thus although the fragmentation hypothesis makes a general prediction of 
associative symmetry (i.e. P(A|B) = P(B|A); see Asch & Ebenholtz, 1962), it is recognized that 
in an inherently asymmetric paradigm, such as that of paired associates, overt backward 
association will be weaker than forward association (e.g. Ekstrand, 1966). Similarly, while the 
presence of full AB fragments would mean that subjects should be able to recognize the A-B 
pairing no better than they recall B when cued with A, or vice versa, A,A; B, Ba fragments (for 
example) would be adequate for the recognition task but not for the recall task. 

It is also difficult to apply the fragmentation hypothesis quantitatively to the free recall and 
recognition methods, for a similar reason. Both methods clearly depend upon the remembering 
of information about the context in which items occurred, but much of this information is 
uncontrolled. As suggested by Anderson & Bower (1974), a subject might link a word to be 
remembered with transient features of his environment. When subsequently presented with the 
word he might correctly recognize it as a result of recalling a transient feature associated with it; 
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but when asked to free recall it he might be able to do so only if it were also linked with a 
feature used as a general cue for the list. 

The restriction of the range of the quantitative application of the fragmentation hypothesis to 
those cases in which cue and information to be recalled are explicitly controlled is not viewed 
here as a disadvantage of the hypothesis. Rather, it is in keeping with normal scientific practice; 
for example, the observation of a constant value for the acceleration due to gravity is of general 
importance even though it occurs only in controlled circumstances (e.g. in a vacuum). The 
analogy may be pursued further. In objection to the fragmentation hypothesis, it may be thought 
to be important that it does not specify in advance of experiment the probabilities of occurrence 
of different types of fragment. But there is logically no more force to this objection than there is 
to the objection that the value of the constant acceleration due to gravity, 981 cm/sec?, is also 
the subject of experimental determination. The fragmentation hypothesis is concerned with the 
structural representation of human memory: it does not specify in advance the composition of 
the memory trace. However, once the validity of the hypothesis’s structural representation has 
been demonstrated the fragment is established as a unit of analysis. It is then possible to 
elucidate problems concerning the composition rather than the structure of the trace. 

It has been shown here that memories derived from sentences, visually imaged sentences, and 
pictures are isomorphic. That is, the memory process operates in a structurally identical way 
upon fragments of input which were either verbal or visual in format, and encoded either with or 
without the operation of mental imagery (cf. Pylyshyn, 1973). Thus it is argued that a valid 
distinction between visual and verbal memory can only be made in terms of content, not form. 
Significant differences were in fact shown here to exist between the levels of recall of pictures 
and of dimensionally equivalent verbal descriptions, and also between sentences and imaged 
sentences. It was found that visual presentation and imagery instructions both particularly 
enhanced the probability of encoding locational information. 


Anderson & Bower's (1973) model 


Since a substantial proportion of this article has been concerned with the representation of data 
of Anderson & Bower, it is appropriate to compare the fragmentation hypothesis with Anderson 
& Bower’s own model. 

Empirically, the finding that the fragmentation hypothesis was the more accurate was 
unequivocal. Theoretically, the models are of interest as the products of two different 
methodologies. To dichotomise, the fragmentation hypothesis and HAM are the results of 
associationist and neo-associationist programmes, respectively (see Anderson & Bower, 1973, 
chapter 1). The fragmentation hypothesis arose as the result of examining experimental data for 
basic structural properties such as asymmetry (Jones, 1976). The design of HAM, on the other 
hand, was greatly shaped by non-experimental factors such as intuitive plausibility and 
computational feasibility. 

It may also be noted that Anderson & Bower were concerned to predict not only the structure 
of recall but its content. To a substantial extent, though not entirely, the two types of prediction 
by HAM are theoretically independent. That is, one part of the model can be altered without 
necessitating any alteration in the other part, although the overall prediction is obviously altered 
in each case. Conversely, a failure of prediction by the model can be corrected by altering either 
its structural or its content component. During consideration of Expt 1 of Anderson & Bower 
(1973, chapter 10) here, it was pointed out that an intermediary stage in the formulation of 
Anderson & Bower’s sentence recall model was actually a special case of the fragmentation 
hypothesis. The analyses conducted here suggest that in order to attempt to fit the empirical 
data, Anderson & Bower should at that point have adjusted the content component (e.g. HAM’s 
linguistic parser) rather than the structural component of their model. A further implication is 
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that the fragmentation hypothesis and the general HAM framework are not incompatible. Rather, 
the fragmentation hypothesis places a powerful empirical constraint on the structural aspects 
(directly) and the content aspects (indirectly) of HAM-like models. 
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The horizontal—vertical illusion and the square 


I. C. McManus 


The horizontal-vertical illusion, in which a vertical dimension is overestimated relative to a 
horizontal one, is usually described in terms of either an L figure or an inverted T (1), and as 
such has been extensively investigated. It is usually claimed (e.g. Robinson, 1972) ‘that a plain 
square is too stable a configuration perceptually to suffer from the vertical-horizontal illusion’. 
However, the present author has been unable to find any reference to support this assertion, and 
it has therefore been tested experimentally. 


Method 

Five white cards (76x51 cm) were prepared on which were placed nine rectangles. One of these rectangles 
was an exact square whilst the others differed from a square by +1-2, +23, +4-7, +7:2, -1-2, -2:3, —4-7 
and —7 2 per cent, a positive value indicating that the horizontal dimension was greater than the vertical On 
each card either the horizontal or the vertical dimension of the rectangles was kept constant at 105 mm, and 
the other side varied proportionately. Three of the cards contained solid black rectangles (of Letrafilm 
236M), one with the vertical dimension kept constant (card B2), and two with the horizontal dimension kept 
constant (cards B1 and B3). The other two cards contained rectangles formed of square-wave gratings (of 
Letraset LT 107), one with the grating vertical (card V), and one with the grating horizontal (card H): in 
these cases the fixed side was perpendicular to the direction of the grating. Each rectangle contained 54 
cycles of the grating. On each card the nine rectangles were arranged randomly in a 3x3 matrix, the 
arrangement being different for each of the five cards. The cards were shown to each subject in the order 
Bi, B2, V, H, B3; they were viewed from a distance of 2 m and subjects were tested individually. The 
subjects were informed that one, and only one, of the rectangles on each card was a perfect square and all 
that they had to do was to say which one they thought it was. No time limit was set and most subjects made 
a decision within 10 or 15 sec. The subjects were not told whether their choices were correct. Subjects were 
instructed not to tilt their heads whilst looking at the cards. 


Results 


Sixty subjects, most of whom were undergraduate members of the University of Cambridge, 
took part in the experiment. Each subject made’a total of three choices of solid black rectangles 
(cards B1, B2, and B3). The horizontal—vertica] illusion was clearly shown, 26 (14-44 per cent) 
choices being for the exact square, 123 (68-34 per cent) for rectangles with horizontal sides of 
greater length than vertical sides, and 31 (17:22 per cent) for rectangles with vertical sides of 
greater length. The mean illusion was +1-58 per cent, which is significantly different from 0 

(t= 6-99, d.f. = 59, P<0-001). No significant difference was found between those figures with a 
constant horizontal side (cards B1 and B3), and those with a constant vertical side (card B2), the 
mean illusions being +1-55, and +1-64 per cent respectively, (t = 0-79, d.f. = 59, n.s.). The 
presence of the horizontal and vertical gratings on cards V and H significantly affected the 
results, the mean illusion in the figure with the vertical lines being +0-72 per cent, and that in 
the figure with horizontal lines being +2-98 per cent, both values being significantly different 
from zero (t= 2-46, d.f. = 59, P< 0-02, and t= 9-99, d.f. = 59, P< 0-001, respectively). Both 
values are significantly different from the responses with solid figures (t = 2-71, d.f. = 59, 
P<0-01, and t= 5-34, d.f. = 59, P< 0-001, respectively). This difference between the figures is 
presumably an interaction between the horizontal—vertical illusion and the Helmholtz illusion. 


Conclusions 


The method described presents a convenient choice method for demonstrating the existence of 
the horizontal—vertical illusion in the square. The interaction with the Helmholtz illusion is 
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significant, allows a new approach to the analysis of the illusion, and may also be of use in 
nullifying the illusion in an applied situation. 
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Some comments on Brée & Coppens’ ‘The difficulty of an 
implication task’ 


David Moshman 


In a recent article in this Journal, Brée & Coppens (1976) tested Brée’s (1973) model of 
performance on Wason’s extensively studied ‘four-card task’. The Brée model is of considerable 
interest in that it (a) differentiates comprehension of the proposition to be tested from the 
hypothesis-testing strategy itself (as do Smalley, 1974, and Moshman, 1977), and (b) is closely 
related to Piaget’s theory of formal operations (Inhelder & Piaget, 1958) in its consideration of 
combinatorial analysis (elaboration of possibilities) and hypothetico-deductive reasoning 
(reasoning based on possibilities rather than facts). Unfortunately, the test of the model is 
marred both by incorrect predictions and questionable exclusion of subjects from the data 
analysis. 

The predictions of the model are discussed on p. 581 of the Brée & Coppens (1976) article and 
schematized in Table 1 on that page. There seem to be three discrepancies between the table and 
the verbal discussion. First, with regard to the ĝ card and the illative interpretation, it 1s stated 
that ‘subjects using either strategy A or B would not select this card’. But Table 1 indicates that 
strategy B would indeed lead to the selection of the ĝ card for subjects using the illative 
interpretation. The table, rather than the text, seems to be correct in that such subjects ‘select 
any object which could have a symbol (hidden or visible) requiring the presence of any other 
particular symbol’, and there clearly could be a p on the other side, requiring the presence of a q. 

Later on the same page, the authors state that ‘subjects interpreting the proposition as illative 
implication should never select a card with an odd digit (q), no matter which strategy they use’. 
But this conflicts with their explanation in the preceding paragraph of why this card will be 
chosen by subjects using strategy C, and with the indication in Table 1 that ĝ will be chosen by 
strategy B subjects as well. Again, the table seems to be correct, not the text. 

Finally, still discussing the effect of subjects’ interpretation of the proposition on choice of ĝ, 
the authors state that ‘those interpreting it as a converse, should always select such a card’. But 
according to Table 1, subjects using a converse interpretation will not choose q if they are using 
strategy A. There is perhaps an ambiguity in the theory here. It could be argued that since, 
according to the definition of the converse interpretation in Table 1, subjects see pd cases as 
disconfirming, a card with a ĝ on it requires a p on the other side and should thus be selected 
regardless of strategy (as suggested in the text). On the other hand, if we take the converse 
interpretation simply to mean an interpretation in which p requires q and vice versa, then ĝ does 
not inherently require ğ and should be selected only by subjects considering the implications of 
hidden symbols (strategies B and C). This latter reading of the theory, which leads to the 
predictions in Table 1 (but disagrees with those of the text), seems preferable in that it makes 
the modus tollens inference (p—>q; q; therefore p) a matter of strategy rather than of 
interpretation. Taken this way, the Brée model has interesting implications not only for four-card 
task performance but for deductive reasoning in general. For example, recognizing the validity 
of modus tollens may not be inherent in the comprehension of implication but require a formal 
operational deductive strategy in which possible conclusions are considered (combinatorial 
analysis) and their implications traced out (hypothetico-deductive reasoning), a strategy 
analogous to four-card task strategies B and C in the Brée model. 

Moving to the Results section of the Brée & Coppens paper, it should first be noted that 
Table 2, in which the results are presented, is misleading. The figures for both groups combined 
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(third section of the table) have apparently been summed across strategy as well as across 
groups, though the table does not clearly indicate this. Thus, under converse implication, the 
numbers 11, 2, and 4 all come from column A (not B), while under illative implication the 
number 6 is not really in column B but rather is a sum of 1 from column A, 2 from column B, 
and 3 from column C. 

A more serious problem than the misleading table is the fact that the discussion of results only 
considers those 19 subjects ‘who evaluated the proposition as a converse or as an illative 
implication and selected a pattern of cards predicted by the strategy model’. The authors note 
that the two subjects using an illative interpretation but choosing p and q do not conform to the 
model, but fail to note that the subject using the illative implication but choosing p and p (a 
pattern not predicted by the model for any combination of strategy and interpretation) is equally 
disconfirmatory. In addition, there are four subjects giving interpretations of the proposition not 
conforming to the two legislated in advance who are for this reason discarded from the analysis. 
These subjects do not inherently falsify the model; it would be useful to see what interpretation 
each of them does use and whether their choice patterns could result from the application of 
one of the three Brée strategies to their interpretations. 

In conclusion, the Brée & Coppens article seems to suffer from inaccuracies in making 
predictions, a misleading presentation of results, and questionable exclusion of subjecis from the 
data analysis. It should be emphasized, however, that this in no way discredits the Brée model, 
which remains a viable and interesting model of four-card task performance, and which may 
have broader implications for deductive reasoning as well. 
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A reply to Moshman 


D. S. Brée and G. Coppens 
Moshman criticizes our article on three counts. We reply to each in turn: 


1. Inaccuracies in making predictions. We lament that we were indeed inaccurate in specifying 
what the strategy model would predict in the example we used, a point that Wason has drawn to 
our attention. Moshman is correct in assuming that the predictions in Table 1 are correct. For 
the record the text should be altered on p. 581 as follows: 


line 13: *...only if they use strategy C.’ to ‘.. .if they use strategy B and C.’ 
line 16: ‘...strategy C.’ to ‘...strategy B or C.’ 

line 17: ‘...either strategy A or B...’ to ‘...strategy A...’ 

line 23: *.. .odd digit (@)...’ to *. . .consonant (P)... > 

line 24: ‘...should always...’ to ‘...may sometimes...’ 

Our apologies to other readers who have been inconvenienced. 


2. Misleading presentation of results in Table 2. Moshman is quite correct in his heading of 
Table 2. The layout could have been clearer. 


3. Questionable exclusion of subjects from the data analysis. It frequently happens that a few 
subjects in the Wason selection task make selections that are not accounted for. There is one 
such subject in this case who selected p and p. Unfortunately such subjects are of little help in 
deciding between the insight and strategy models. Protocol evidence (Brée, 1975) indicates that 
some at least of such subjects believe that it does not matter which cards are selected. 

The four subjects who selected p and q had the following interpretation(s) in the evaluation 


Subject: Pq pã pqa Pq 
VE4 + 0 0 Ze 
EV4 & EV8 + 0 0 0 
EVI + = - 7 


where + is conforming, — is disconfirming and 0 is irrelevant. In each case the simplified version 
of the insight model would predict, in the case of ‘no insight’, a selection of p and q. This was 
indeed the selection of all four subjects. To make predictions from the strategy model it would 
be necessary first to infer what underlying representation could give rise to these evaluations. 
This is more than we are prepared to do. 

We thank Moshman for his clarification and criticism. 
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The Dynamic Personality Inventory: What does it measure? 


Paul Kline and Ron Storey 





The Dynamic Personality Inventory, the 16PF, the EPI, the Gotther! scale, the Lazare scales, A130, OPQ and 
OOQ were administered to 128 subjects and subjected to an oblique Promax factor analysis from which 15 
factors emerged. It was established that the DPI scales were independent of extraversion and neuroticism 
and were measuring variance unaccounted for by the 16PF and EPI. Two factors, masculine and feminine 
interests were identified as clearly measured by the DPI and the A scales were also found to be consistent. It 
was concluded that the DPI was a useful test where more orthodox factored personality tests were wanting. 





The Dynamic Personality Inventory, the DPI (Grygier, 1961) based upon an earlier 
psychoanalytic test, the Krout Personal Preference Scale (Krout & Krout, 1954), utilizes 
psychoanalytic personality theory as a starting-point for test construction modified in the light of 
factorial studies, item analyses, psychometric data and clinical experience (Grygier & Grygier, 
1976). It purports to measure the tendencies, sublimations, reaction-formations and defence 
mechanisms associated with the various patterns of psychosexual development and contains 33 
scales, constructed from factor analyses of inter-item correlations, and of high reliability, most 
of them being around 0-8. 

There are several reasons why the DPI deserves rigorous investigation. First, as Sells (1972) 
points out, if the claims in the earlier test manual were correct (Grygier, 1961), the DPI would be 
about the best personality test ever constructed. Secondly, the 33 scales appear to measure 
variables quite different from other personality tests, in particular the 16PF test (Cattell, Eber & 
Tatsuoka, 1970) which, of course, claims to measure the major variance in the personality 
sphere, and the EPQ (Eysenck & Eysenck, 1975) with similar pretensions. Furthermore the DPI 
items which consist of words or,phrases to which subjects have to indicate ‘like’ or ‘dislike’ are 
distinctively different from those in most other personality tests. In addition the DPI seems able 
to discriminate meaningfully among different kinds of student (Stringer, 1967; Hamulton, 1968). 
These studies involved architectural, science and arts students, groups not easily separated by 
personality tests. Grygier & Grygier (1976) have also shown that the DPI is useful in the 
investigation of criminal and other clinical groups. Finally should the scales unequivocally be 
shown to be valid, objective evidence relevant to Freudian personality theory would be 
provided. In brief, therefore, validity studies of the DPI would be highly relevant to the factor 
analytic structure of personality, to psychoanalytic developmental theory and to atheoretical, 
empirically established personality differences between groups. 

Grygier & Grygier (1976) have recently presented new data relevant to the validity of the DPI. 
They demonstrate that with three-day assessment procedures and psychoanalytically oriented 
interviews as a criterion, the DPI performed better than did most other personality tests. They 
point out, too, that since the DPI 1s intended to be interpreted as a whole, using the profiles, and 
taking advantage of the information from the 33 scale scores (in contrast to the three scales of 
the EPQ), rigid validity studies of individual scales are not appropriate. 

Although this argument has some force, it still remains the case that the DPI scales must be 
measuring personality variance. It is therefore important to know what this variance is and 
factor analysis of the DPI with other tests would answer this question. 

Kline (1972, 1973), Cattell & Kline (1977), and Grygier & Grygier (1976) have fully surveyed 
the factor analytic studies of the DPI. In brief, it can be argued that the validity of the DPI 
scales has never been demonstrated factor analytically, because the method which necessitates 
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comparing the DPI factors with other known factor systems, e.g. those of Cattell, Eysenck and 
Guilford, has never been tried, perhaps because it is so time-consuming. Thus research has been 
restricted to factoring the DPI on its own (e.g Stringer, 1970) or with cne or two other scales 
(Kline, 19685). This of course can tell us only about the structure of factors within the DPI not, 
and this is the crucial point, what these factors are. Such information would be valuable for 
assessing one aspect of the validity of the DPI. 


The present investigation 

In the context of this criticism of previous attempts to investigate the validity of the DPI it was 
decided to factor analyse the DPI, the 16PF, the EPI and certain other scales purporting to 
measure psychosexual traits, as part of a larger investigation into orality. 


Rationale of design 


This design would enable us to locate the DPI factors relative to the best-known personality 
factors — those of Cattell and Eysenck. It would also enable us, to some extent, to check up on 
the validity of factors which were independent of the major personality systems. 


Tests used 


(1) 16PF test. Form A. These factors can be regarded as marker variables since much 1s known about the 
psychological meaning of the factors (see Cattell, 1973). 

(2) The EPI. Form A. These were similarly regarded as marker variables. Tce EPQ was not used because 
it was not available at the time of testing. 

(3) Ai3Q (Kline, 1971). This test measures anal or obsessional traits and has been shown previously to 
correlate with the anal scales of the DPI (Kline, 1968 b). 

(4, 5) OPQ and OOQ (Kline & Storey, 1977) These are two 20-item scales measuring oral optimistic and 
oral pessimistic traits. Their construction and evidence for their validity has been fully discussed in Kline & 
Storey (1977). These will be useful checks on the DPI ʻO’ scales. 

(6) Gottheil Scale (Gottheil, 1965). A 40-item scale of orality, inserted as a check on the DPI ‘O’ scales 
(7) Lazare oral scales (Lazare, Klerman & Armor, 1966). These scales, based on the work of Goldman- 
Eisler (1948) which has been shown to be promising by Kline (1972), were included as a further check on the 
‘O’ scales of the DPI. Following the factor analyses of Lazare et al. (1966) and Paykel & Prusoff (1973), six 
scales were used: dependence (D), egocentricity (E), fear of sexuality (F), passivity (Pa), pessimism (Pe) and 

self doubt (Sd) 


(8) The Dynamic Personality Inventory. The DPI scales are set out below. 
H = _ Hypocrisy Pe  Exhibitionism 
Wp Passivity Pa Active Icarus complex 
Ws Seclusion and introspection Ph Fascination by heights (passive Icarus complex) 
O  Orality - liking for creamy foods Pf Fascination by tire 
Oa Oral aggression - liking for crunchy Pi Icarian exploits 
foods S Sexuality 
Od Oral dependence Ti Enjoyment of tactile impressions 
Om Need for independence Ci Creative interests 
Ov Verbal aggression M Masculine sexual identification 
Oi  Impulsiveness : F Feminine sexual identification 
Ou Unconventionality MF Tendency to sek social roles 
Ah Hoarding SA Interest in social activities 
Ad Attention to detail C Interest in children, need to give affection 
Ac Conservatism, rigidity Ep Ego-defensive persistence 
Aa Submissiveness to authority Ei Initiative, tendency to plan 
As Anal sadism 
Ai — Insularity 


P Interest in objects of phallic significance 
Pn Narcissism 
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Subjects 
Subjects were 116 students and 12 older adults. Sixty one were males. 


Statistical analysis 

Product moment correlations between the test variables were subjected to a principal components analysis 
folowed by a Promax (Hendrickson & White, 1964) oblique rotation in which components with eigenvalues 
greater than one were rotated. 


Results 


Since the results of this study with so large a number of variables are necessarily complex and 
since rotated factor analyses are intended to simplify them only the Promax rotation 1s presented 
(in Table 1). For ease of comprehension only loadings greater than 0-3 are shown — loadings 
usually regarded as significant in this type of work, since 0-3 is well beyond the significance level 
for a correlation. Where necessary, of course, in our discussion, correlations between variables 
will be given. 


Discussion of results 


Before we begin to identify the factors presented in Table 1, a few points about the design of 
this investigation deserve note. First the aim of the study was to reach simple structure for, as 
has been shown by Cattell (1973) and Cattell & Kline (1977), when this is done, complementary 
systems of personality factors such as those of Cattell, Guilford and Comrey fall into place. 
Thus, for example, in the case of Comrey, apparent differences between his factors and those of 
Cattell disappear. Comrey’s factors are essentially the second-orders of Cattell. The assumption 
here is that the problem of the infinity of possible factor analytic solutions is solved by reaching 
simple structure —- which yields meaningful and replicable factors. 

Has simple structure, however, been obtained in this investigation? We are aware that Cattell 
(1973) has claimed that analytic programmes such as Promax are inferior to topological 
programmes, such as Maxplane (Cattell & Muerle, 1960) in reaching simple structure. While this 
is true in some cases, in others there is little practical difference in the solutions. In the present 
investigation we can check the adequacy of the solution because the factor structure of both the 
16PF and the EPI is known. Thus if the expected factors are obtained from the tests we can be 
confident that our solution ıs not inaccurate or at least no worse than that usually obtained with 
these tests. Fortunately, as we shall see, Promax here gave a good solution. The results are 
trustworthy. 

Finally, the fact that the best-known personality factors emerged answers any misgivings 
about the relatively small size of the sample although there were well over double the number of 
subjects to variables, an essential of sound factor analysis (Guilford, 1954). 


Interpretation of factors 


The first question to be considered is the extent to which the DPI factors are independent of the 
EPI and 16PF factors which are held to subsume much of the variance in the personality sphere. 
To answer this we must first look for the two largest second-order factors extraversion or exvia 
and neuroticism or anxiety. How do the DPI factors load on these? 

Factor 3 is clearly the anxiety factor. It has high loadings on Cattell’s C, O and Q4 and the 
EPI N. It is therefore interesting to note that no DPI factors load on factor 3. This enables us to 
state unequivocally that all the DPI scales are independent of the anxiety or neuroticism factor. 
This certainly means that the DPI is measuring variables untapped by the main factored 
personality tests. If some profile of DPI scores were related to anxiety, then small loadings on 
the relevant factors would be expected. 

Factor 8 is the extraversion factor. It has high loadings on Cattell’s A, H, F, Q2(—) and the 
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Table 1. Promax rotation of all personality tests 


Factors 


15 


14 


13 


12 


11 


10 


Variable 


DPI variables 


arn Pe 0 es 


i or le ee 


ema wae ae 


fe EL A 


[0] 


a ae ae | 


Pa Te ees 


Om 


Ou 
Oa 


Od 


Oi 


Ov 


Le eSBs ob a dy ee 


Ws 


Wp 


Tike fa Oa Pe | 


C 
Ci 


Ad 


Ac 


As 


Aa 


LITIEEI 
LILE] 


E 


Huh A Po Atenas 


16 PF variables 


The Dynamic Personality Inventory 379 


Pea ar a eater aries 


0-8339 


ae ae he a ee ee 


-0:3059 — 


Phe 1s dCi ENIE a ae fee cae Oe os Ps TS 

$ ge ~~ 

pare E an) ae ae oe ee ae a FRS 

¥ m 

E E Age ee n a ae) My S a A 
2 2 2 z 

E E E TE E E 

3 a N & 2 

i as ee billie IISI 

S E S 

PERE aL oleae eed peT] isl Git oa id ae ae 

3 S 

PoC eat I SRP AA Pa ee 


= S SSE 838 R ER =. 
EESTI TEELE tets e] Ssi ISi $ 
2 e 8 
= iva wm 
eto Brvedini ISlilliéltia 
\o o0 EA 
mM ~œ 
tS attek i ETATER T ES tia 
& 
f 
Joy Ea 
<a0mmomnnzZod88S 5 haan tee S52uz Ff 





380 Paul Kline and Ron Storey 


EPI E. Only three DPI scales load on factor 8 — Sa, sociability, Pe, exhibitionism, and Ei, 
initiative. Since the highest loading of all was on Sa, it may well be that this factor is one of 
gregarious sociability. The Pe and Ei loadings while sensible were not expected. From the 
viewpoint of the validity of the DPI we can say that all but three scales are independent of 
extraversion. Thus factors 3 and 8 indicate the DPI is largely independent of the two largest 
second-order factors in personality. There can be no doubt that the DPI overlaps the EPI only to 
the slightest extent. 

The relation of the DPI scales to the two smaller Cattell higher order factors which are not so 
well defined - sensitivity and dependence will be examined as we discuss the other factors. We 
shall now examine the relation of the DPI scales to the most pervasive factor of all, g, 
intelligence, Clearly the DPI should not load on an intelligence factor, if the scales are valid. 

Factor 9 is the intelligence factor, loading on Cattell’s B. No DPI factor loads on factor 9, 
although S, admission of sexuality, loads 0-297, thus indicating that the more intelligent subjects 
are prepared to admit their sexuality. Nevertheless factor 9 enables us to say that the DPI scales 
are independent of the intelligence factor. 

From these three factors, therefore, it seems reasonable to argue that the DPI is measuring 
variance largely untapped by the major personality tests. We must now attempt to identify this 
variance. Do the scales load up on factors as expected from their names? Of course even if they 
do, in most cases before a proper identification could be made we should need some external 
evidence in support. 

Factor | is either the anal or superego factor. It is a virtual replication of the first factor found 
by Kline (1968 a) in his study of the anal character which gives us considerable confidence on its 
stability. It loads on all the DPI A (anal) factors as it should (except Ah), H, hypocrisy, S, 
sexuality, negatively, G, Cattell’s superego, Q3, self-control, and Ai3Q anal character. Not only 
is its similarity to the previous anal factor convincing but the loadings on Q3 and G all support 
the identity of this factor as one of rigid self-control, characterized by anal or obsessional 
characteristics. Thus factor 1 confirms the validity of the Grygier A scales (except Ah) and 
incidentally the concept of the anal character. The H and S loading confirm the priggishness, the 
refusal to admit weaknesses inherent in this set of personality traits. It is to be noticed that this 
factor 1, refutes, as Kline (1978) has pointed out, the rather weak objections to the identification 
of the 1968 factor as anal by Hill (1976). In brief then factor 1 supports the validity of the A 
scales of the DPI. 

The other factors can be dealt with more briefly. Factor 2 with its huge loading on Cattell’s L 
(such loadings are possible with Promax, cf. Hendrickson & White, 1964) is clearly distrust or 
suspiciousness. Only two DPI scales load on it — Ov, verbal aggression and Pe, exhibitionism. 
This is some support for the validity of Ov as verbal aggression but again it is a factor largely 
independent of the DPI. Factor 4 is a pure DPI factor. This is feminine interests. Its loadings are 
of considerable psychological interest since (pace womens’ lib.) it links feminine interests with 
liking for children, dependence, creativity and liking for food — both creamy and hard and strong 
tasting. Eating disorders are characteristically feminine, e.g. Anorexia nervosa, so this factor 
makes good psychological sense. Incidentally it should be noted that this factor is contrary to 
the Freudian theory of orality since liking for the two different kinds of food is held to be due to 
fixation at the oral erotic or the oral sadistic phase. Factor 4 demonstrates that the DPI measures 
an aspect of personality missed out by the 16PF and the EPI. 

Factor 5 is another DPI factor, this time of adventurous (phallic) masculinity, with loadings on 
Icarian exploits, Pi, and Pa, the active Icarus complex. The opposed loading on Ov suggest that 
strong active adventurous people tend to keep silent and avoid verbal aggression. Only Cattell’s 
N, loads on this factor so as with factor 4, factor 5 demonstrates that the DPI measures an 

f aspect of personality, masculine adventurousness, missed out by the 16PF and EPI. Factor 6 is a 


The Dynamic Personality Inventory 381 


mixture of DPI and 16PF scales. Loading on Wp, passivity and Pn, narcissism, together with O, 
interest in creamy foods, it suggests an indolent, luxury-loving, lounge-lizard. This is supported 
by the loadings on I. 

Factor 7 loads on only three DPI factors and from its large loading on Oi is probably best 
thought of as impulsivity, an identification supported by the loadings on the other tests. With 
factors 6 and 7 and with the subsequent factors in our discussion, which are all of small 
variance, we now have to consider, even when they can be identified, whether or not we are 
dealing with what Cattell has called bloated specifics (Cattell, 1973). Thus if we construct tests 
with somewhat similar content it is possible that they load up to form a substantial factor. 
However, the factor indicates only the way the items have been constructed rather than any 
genuine psychological entity. The distinction between bloated specifics and true group factors 
always becomes obvious with further experiments in which the factor is studied in relation to 
external criteria. Even if these factors are not bloated specifics, and 6 and 7 do load on a variety 
of tests making it less likely than if they were pure DPI factors, it must still be accepted that 
these are factors of small variance and thus perhaps of more restricted psychological interest 
than the major factors which we have discussed. 

Factor 10 is probably a factor of practicality with its loadings on the Cattell M and N factors. 
Of the DPI only Om, the need for independence, loads on it. Factor 11, resembles a possible 
bloated specific with its loadings on Oa, liking for crunchy foods, Od, dependence, Ou, 
unconventionality and Ws and Wp, introspection and seclusion. We might have interpreted this 
as an oral factor but neither the Lazare, Gottheil or OOQ and OPQ oral scales load on it. Factor 
12, similarly, is probably a bloated specific loading on E, persistence and Cattell’s Q1. Factor 13 
has no DPI loadings and is probably a factor specific to the Lazare tests. Factor 14 again seems 
of little importance loading highest on the Gottheil oral scale. 

Factor 15, however, is of some psychological interest. With loadings on Ci, creative interests, 
Ah, anal hoarding (negatively) Ti, tactile interests, and P, Ph, the phallic scales, it is support for 
the Freudian theories of sublimation in art, where liking for clay and paint is a sublimation for 
handling faeces (the Ah loading) and the P scales testify to the phallic significance of art! 

So much for the 15 factors. In view of the fact that their identification could be only tentative 
(unless the 16PF or EPI markers were involved) it was not considered useful to proceed to the 
extraction of second-order factors. However, the correlations between these Promax factors 
were in the main non-significant, and the highest correlation was only 0.385, between 2, distrust 
and 14, dependence. Since from the actual polarity of the loadings, factor 2 emerged as trust 
(rather than distrust), this correlation is certainly sensible. Examination of our test of factors 
suggests only one obvious correlation — a negative one between factor 1, anality and 7, 
impulsivity, and this is indeed found (—0-354). The only other correlations between factors 
greater than 0-3 were between 4, feminine interests and 8, sociability and between 4 and the 
Lazare factor 13. 


Conclusions 


The rationale of this investigation was to attempt to locate these DPI factors within the 
personality sphere, and thus identify the factors, as far as is possible without external referents. 
In fact this analysis of the DPI with a variety of other scales has enabled us to draw a number 
of conclusions, as we have discussed in the Discussion of results section, and these can be 
summarized. 

(1) The DPI scales are independent of the two largest personality factors, extraversion or 
exvia and anxiety or neuroticism. 

(2) The DPI scales are independent of the intelligence factor g. 
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(3) The A scales of the DPI do load up as expected and provide a measure of anal or 
obsessional personality, a surface trait not a source trait in the Cattell system. 
(4) The DPI can provide scales for measuring feminine interests, an aspect of personality not 


included in the 16PF (Factor 4). 


(5) The DPI can similarly provide scales for measuring masculine interests (factor 5). 
(6) In factor 15, there was some support for the Freudian theory of sublimation and the P 


scales of the DPI. 


(7) There was no support for the O, orality scales of the DPI. 

From this study of the DPI we would argue that, for special purposes, e.g. the investigation of 
masculinity or feminity or for the study of artistic creativity or in those fields where more 
orthodox personality tests have little success, such as criminality, the DPI is likely to be a 
useful test. There can be no doubt that it measures variables not included in the best-known 


factored personality tests. 


Summary 


The DPI, the 16PF, the EPI together with other measures of oral and anal personality traits, were subjected 
to a rotated factor analysis from which 15 factors emerged. Study of these factors revealed that the DPI was 
independent of E and N. The ‘A’ scales of the DPI were shown to be consistent and factors of masculine 
and feminine interests were identified. It was concluded that the DPI was a useful personality test. 
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Acquisition and performance difference between normal and mentally 
handicapped adults on a complex assembly task 


Eileen Tomlinson and Edward Whelan 





Several single studies concerned with work training with the mentally handicapped have been reported in the 
literature (Bitter & Bolanovich, 1966; Huddle, 1967; Gold, 1969; Screven, Straka & Lafond, 1971). Most 
have been concerned with issues of acquisition and motivation to perform. No studies have been reported 
which focus on the evaluation of different techniques of training or which assess rate of acquisition against 
that of non-handicapped controls. 

The aim of this experiment is twofold. It uses a complex task, analysed by MTM-2, to enable comparison 
between the performances of mentally handicapped adults and adults from a ‘normal’ population, in the 
acquisition of new work skills. Various strategies of training were compared and the increase in the rate and 
quality of performance following acquisition was measured for both, in terms of speed and accuracy of 
production. Results add further support for the notion that the potential of mentally handicapped individuals 
is commonly underestimated. It is hoped that the findings provide a basis from which further experiments 
can be developed and evaluated. 








There is little evidence in the literature concerned with the experimental investigation of the 
relative effectiveness of training techniques with the retarded, particularly in the area of skill 
training. Practically all the experimental studies related to training and performance in this area 
are operant, using reinforcement contingencies as their main feature. 

Non-experimental research on vocational training tends to concentrate on predictive 
assessment and evaluation rather than training (Tobias, 1960; Meadow & Greenspan, 1961; 
Elkin, 1967; Palmer, 1974). As Gold (1972) aptly comments: ‘No attempt has been made to 
distinguish between learning ability and production ability, and no attempt has been made to 
make the evaluation period fruitful to the client in terms of the development of the skills which 
are being evaluated.’ Gold (1973) further contends that practically all of the assumptions 
presently held related to the manner in which the mentally handicapped function in the context 
of work must be questioned. He claims that the literature does not acknowledge the relationship 
between task complexity and productivity, neither is the study of a possible relationship carried 
out. 

A wide discrepancy exists between what the retarded do in a vocational sense, and what they 
are potentially capable of doing, both quantitatively and qualitatively. This theme echoes the 
pioneer work of research workers 20 years previously (Gordon, 1953; O’Connor & Claridge, 
1955). 

Clarke & Hermelin (1955) posed four important questions: (a) could these individuals pursue a 
full day’s work on industrial tasks?; (b) could they acquire comparatively difficult skills?; (c) to 
what extent does initial ability relate to final achievement?; and (d) what are the limits to their 
trainability, and what practical and theoretical implications emerge? 

The present investigation was undertaken in order to answer mainly the second of these 
questions. In particular, it makes use of a complex task (handwheel assembly), submitted to a 
standard industrial analysis, presented to both normal and mentally handicapped subjects. This 
facilitates comparison of acquisition and enables performances to be compared with those of 
skilled industrial operatives. 
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Method of selecting sample 


From a group of 50 subjects, five different samples were obtained, whose ages ranged from 20 to 45 years. 
Samples 1 and 2 each consisted of ten ‘normal’ subjects, five of each sex, chosen from a non-engineering 
background. Samples 3, 4 and 5 were made up of trainees from an Adult Training Centre. 

In selecting the mentally handicapped trainees, care was taken to ensure that individuals were not said to 
be in need of ‘special care' or to have been principally diagnosed as mentally ill. Also, trainees were 
eliminated if they suffered from gross visual defects or spastic involvement of the hands, or if they were on 
prescribed sedation. 

In all other respects, the 30 mentally handicapped individuals who were subsequently selected were typical 
of the ability range to be found in an Adult Training Centre (mean IQ approx. 50). For the purpose of the 
present study it was important to ensure their representativeness of the ATC population in respect of manual 
skills. Trainee selection, therefore, took account of time and error scores obtained on a ‘positioning task’, 
previously developed by Grant & Whelan (1971) as part of a battery of tests which were used for producing 
individual profiles of work skills in mentally handicapped adults. The time scores were ranked in ascending 
order, on the basis of which individuals were divided into three ability groups. 

Experimental samples 3, 4 and 5 thus each consisted of ten trainees taken proportionally from the three 
groupings of trainees on the positioning task. They were matched groups, each consisting of five males and 
five females. The three groups were each to be subjected to a different method of training. 


The task 


The ‘handwheel’ (5% in long, 2 in diameter), assembled from 14 components, is normally fitted to the side of 
a variable speed gear-box, providing extremely sensitive speed regulation It has numerous applications, 
including spinning machines, cement manufacture, and heart-lung machines. 


Procedure 


In an attempt to procure information on the behavioural characteristics of a ‘normal!’ population learning the 
handwheel assembly task, a sample of ten subjects (five males and five females) was chosen from a 
non-engineering background. The initial training method adopted was close to that used on the shop floor 
with the provision for ensuring that components were laid out ın a consistent array on the work bench on 
each trial, The time taken to complete the assembly, the errors and items of difficulty were recorded. 

From observations noted from this method of training, ıt became apparent that many of the errors 
incurred during the assembly were the result of subjects’ ignorance of the sequence of the components, 
Therefore, a second training method was introduced using a ‘sequence board’ and two additional strategies, 
aimed at improving the ease of assembly. A different sample of ten ‘normal’ subjects was used, and data 
relating to time and errors again noted. 


Method | 


The components were scattered on a table around the subject but in order to standardize the distances 
between components, a consistency board was developed enabling the layout to be replicated on each trial 
and facilitating analysis by Methods~Time-Measurement (MTM). 

The same instructions were given to normal and mentally handicapped subjects. The experimenter 
showed the subject a completed handwheel assembly saying, whilst pointing to the components placed on 
the board: ‘These pieces when assembled make a handwheel like this one. I am going to put the pieces 
together, and whilst I do so I would like you to watch me, and then you can have a go.’ During the silent 
demonstration, care was taken to show the subject the interior of the casing, and how the pieces fitted 
mside. When the assembly was completed the experimenter dismantled the pieces and returned them to their 
original position on the board. 

Three trials took place in which the experimenter told the subject: ‘You can start when I say “Go”, and I 
will help you should you need it’. The time was taken from start to completion and the errors requiring 
prompts were recorded. 

Most subjects had acquired the ability to perform the task after three trials but in all cases a further three 
trials were given. The time and errors were noted, and any unusual delays or problems were also recorded. 

In the case of SSN Group 3, a further 12 trials were given, three per week over four weeks in an attempt 
to learn more about characteristics of performance after acquisition. 


Acquisition and performance difference 387 


Method 2 


In this method a tray was provided in which the components and tools were placed in the order of assembly, 
from left to right. Once again, the subject was shown an assembled handwheel and received the same 
instructions and demonstrations as in Method 1. 

This procedure differed from that used in Method 1 in that: 

(i) Errors concerned with the sequence of components had been eliminated. 

(ii) The handwheel shaft was used to hold the body casing, whilst assembling the components in the 
intenor of the latter. 

(iil) A piece of thick wire (similar to a short knitting needle) was used to aid in assembling the pinion, te. 
the pinion was threaded on to the wire and dropped inside the casing. 

Following a demonstration, the subjects were allowed six trials as before. The experimenter dismantled the 
pieces and returned them to their original positions on the tray after each assembly. 


Method 3 


This consisted of pretraining experience involving three different tasks, given prior to exposing the subject to 
the handwheel assembly. The pretraining tasks were as follows. 


Task A, This consisted of two parts, matching black and red numbers from 0 to 9 on (a) a board and (b) 
three number wheels mounted side by side on a spindle The subject was asked to find and match black and 
red numbers, one at a time, in a predetermined sequence. This exercise was given to check that (a) subjects 
could differentiate between the colours black and red; (b) subjects could recognize the numbers 0-9 and 
match them. The outcome of this task was that the experimenter was confident that all subjects were capable 
of matching numbers both on the number board and on the digital wheels 


l 
Task B One item of difficulty found in the handwheel assembly and predicted from MTM analysis, was 
locating a pin on the shaft into a small hole in the red digital wheel enclosed by the body casing. Although 
the handwheel shaft could be fitted into the casing quite easily, unless the latter was seated firmly, the next 
component, the circlip, could not be placed into the recess provided. 

It seemed important therefore, to inform the subject of this complex operation, and to provide a strategy 
for dealing with the problem. In trying to ensure that the trainee understood what was required and had 
some experience of asymmetrical positioning a specially designed plug and socket board was used 

The board containing the sockets was placed in front of the subject. The six plugs were shuffled with the 
pins facing upwards. The subject was asked to fit the plugs into their respective sockets in any order, noting 
that each plug could only be located in one direction. The time and the sequence of matching were noted 
during the three trials. 


Task C. The use of circlip pliers is acknowledged in normal industrial practice to be a complex manipulative 
task, especially to the naive operative Not only is guidance required in holding the pliers, but also 
experience is necessary concerning its operation as the legs of the circlip pliers extend when the handles are 
squeezed, and close when the grip on the handles is relinquished. 

To use the pliers effectively, the subject must have the ability to: (a) span the handles of the pliers; (b) 
locate the ‘eyes’ of the circlip; (c) squeeze the pliers; (d) maintain constant pressure on the handles whilst 
the clip ts held; and (e) relinquish the grip, when the clip is located in the appropriate position. The apparatus 
was designed to provide experience in the manipulation of the pliers, involving the precise location of the 
‘eyes’ spaced at various distances. 

The holes on the board were arranged in pairs, ın six compartments ranging from 7 to 22 mm in 3 mm 
Stages. 

A demonstration was given by the experimenter in locating the six pairs of ‘eyes’. When the correct 
pressure and location was achieved the bulb could be wluminated. 

The subject was then requested to light up the bulb on each pair of holes. He was allowed three trials 
(a, b, and c): (a) commencing with the small pair of ‘eyes’ and proceeding in ascending order, i.e 1 to 6; 

(b) commencing with the large pairs of ‘eyes’ and proceeding in descending order, i.e. 6 to 1; and (c) locating 
pairs of ‘eyes’ alternately, i e. 1-3-5- 2-4-6. The time taken to carry out the three trials was recorded. 
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Results 


On the basis of the data resulting from the first six trials, experienced by both the normal and 
subnormal subjects, analysis of variance was carried out which (a) compared the performance of 
norma and subnormal subjects and took account of the sex, methods and trials, and (b) 
compared the performance of the SSN subjects exposed to the three different methods, also 
taking account of possible effects of sex and trials. 


(a) Performance of normal and subnormal subjects 


Table | is in two parts, showing the analysis of variance carried out in respect of accuracy 
(errors) and speed (time) of performance of the normal and subnormal subjects. It can be seen 
that the two groups were not significantly different in respect of accuracy, neither was there a 
significant sex difference. The speed of assembly of normal subjects was greater (P< 0-05) than 
of subnormal subjects where the same method of training was used. There was a highly 
significant (P< 0-01) effect of trials, not surprising as this was a learning situation. 

In order to determine the extent to which performance of the subnormal subjects might 
continue to improve with further trials, both in terms of accuracy and speed of assembly, each 
of the ten subjects was allowed three trials to assemble the handwheel on one occasion each 
week for four consecutive weeks, i.e. 12 trials. Positive reinforcement was given during this 
period, consisting both of individual competition between the trainees, with regular 
announcements of results by the ATC Manager, and the use of an illuminated ‘pacer clock’ in 
order to set individual targets. It should be noted that financial rewards were not applied. In 


Table 1. Analysis of variance comparing two methods of training normal and mentally 
handicapped adults to perform an assembly task, measuring errors and time taken 





Errors Time 

Source of Sum of Mean Sum of Mean 
variation squares d.f. square F squares d.f. square F 
A (sex) 49-5 1 49-5 33 46 370-4 l 46370-4 33-837 
B (ability) 310-5 1 310-5 297 993 821-4 1 993 821-4 725-205* 
C (methods) 283-8 I 283-8 189 2 43309-1 ] 43 309-1 31 603 
D (trials) 952-8 5 190-6 28-029** 187831-6 5 37566-3 1-3386 
AB 33 l 33 22 6955-3 l 6955-3 5-075 
AC 36 1 36 24 26 125-1 t 26125-1 19-064 
AD 37-8 5 76 1-118 3243-4 5 648-7 0-0231 
BC 61 1 61 40-666 15876-3 1 15876-3 11-585 
BD 110-8 5 22:2 3-265 3307-7 5 661-5 0-0236 
CD 362-2 5 724 10-647* 1122-5 5 224-5 0-0079 
ABC 69:3 i 69-3 46:2 20387-1 j} 20387-1 14-877 
ABD 14-9 5 2-9 0-426 11839-2 5 2367-8 0-0844 
ACD 15-8 5 3-2 0-471 3665-7 5 733-1 0-0261 
BCD 76-7 5 15:3 2-250 14348 5 2869-6 0 1023 
ABCD 52 5 10-4 1-529 7701-6 5 1540-3 0-0549 
Between- 218-4 32 68 — 898 063-3 32 28 064-5 — 

subjects . 
Residual 247-6 ~ T607 ES -= 2192635 “I0 17704 — 

Total 2932-496 239 — = 2503 231-4 239 — — 


* Significant at the 0-05 level. 
** Significant at the 0-01 level. 


Tıme allowed for 
factory operatives 


Mean time (sec) 


2 4 6 8 10 12 14 16 18 


Trials 
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Mean errors 
nN w A tA o ~d 





2 4 6 8 10 12 14 16 18 
Tnals 


Figures 1 and 2. A comparison of normal and SSN samples assembling a handwheel. 
--~, Normals, method | (as used in factory); ——, SSN, method 3 (sequence board and pretraining). 


particular it was hoped to evaluate performance concerning its acceptability to industry, meeting 
the criteria imposed by the factory. 

The results of these extra trials for the subnormal group together with the six previous trials 
for both normal and subnormal subjects is shown in Figs 1 and 2, representing speed and 


errors respectively. 


In Fig. 2 the normal subjects were trained using only the consistency board. They represent a 
naive factory operative, and their learning curve (dotted line) provides an acceptable basis for 
comparing the acquisition of the subnormal subjects. As the intention of this study was to show 
what could be achieved under optimum conditions of training in the case of the subnormal 
group, the latter were provided with the sequence tray and thus initially made fewer errors (see 


trial 1). 


It may be concluded that all performances were acceptable as defined by the 12 min standard 
time allowed for factory operatives to complete the assembly. 


Table 2. Analysis of variance comparing three methods of training mentally handicapped adults 
to perform an assembly task, measuring errors and time taken 


Errors 
Source of Sum of 
variation squares df. 
A (sex) 31-3 l 
B (methods) 545-7 2 
C (trials) 710-6 5 
AxB 162-7 2 
AxC 21:7 5 
BxC 509 10 
ABC 92-9 10 
Between- 222-5 24 
subjects 

Residual 204-3 120 

Total 2500-728 179 


* Significant at the 0-05 level 
** Significant at the 0-01 level. 





Time 
Mean Sum of Mean 
square F squares d.f. square F 
313 18-41 22961-6 1 22961-6 15-58 
272 8 160-5** 277714-8 2 138857-4 94.26* 
142-1 15-28** 120609-5 5 24121-9 0 6022 
81-3 47-83* 69310-7 2 34655-4 23 52* 
43 0-462 11855-9 10 1185-6 0 0296 
50-9 5-473 5506 7 5 1101-3 0-0275 
93 l 13951-3 10 1395-1 0-0348 
9:3 — 961415-5 24 40058-9 — 
iT 176908 120 1442 — 


— 1660234-2 179 — — 
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(b) Performance of subnormal subjects trained by three methods 


Table 2 shows the results of analysis of variance applied to three methods of training the 
subnormal group. These methods were: (i) use of consistency board, (ii) sequence board, and 
(iii) sequence board plus pretraining. It can be seen that the method used had a significant effect 
on time taken (P< 0-05) but surprisingly there was no increase in speed over the six trials or any 
of the methods. This is because the emphasis in training was on accuracy of performance during 
acquisition. The level of significance (P< 0-01) of decrease in errors over trials shown in the 
table confirms this. | 

The three methods did show a significant effect on errors incurred during acquisition 
‘(P<0-01). Methods of training showed increased efficiency in the order (i), (ii) and (ili) (see 
above). The sequence board eliminated errors concerning' the choice of the correct equipment to 
select next in the assembly, whilst the pretraining exercises provided an opportunity for trainees 
to develop some of the appropriate skills in advance of exposure to the experimental task. 


Discussion 


All subjects were able to reach criterion performance, by whichever method of training they 
were given, The task used has been described as one which most ATCs would not wish to 
consider a$ a subcontract job on account of its apparent complexity. We might conclude, 
therefore, that this study confirms earlier ones which have demonstrated low expectations held 
concerning the level of achievement of which mentally handicapped individuals are capable. 


Conclusions 


Gold’s data (1973), arising from his experiment with a brake assembly task, showed that the rate 
and quality of the work performed by the retarded without pay and social reinforcement, was at 
a level far above what he currently found relevant to expectancies and practices in the 
vocational training and evaluation field. He also claims that their performance of this task 
appears far to exceed both qualitatively and quantitively, any performance of the retarded 
reported in the literature. These findings raise questions about the assumption, presently held, 
that pay in some form and praise are the only reinforcers available for work. The lack of 
research investigating the relationship between task complexity and production leads Gold to 
propose that a complex task itself has strong reinforcing properties for the workers. 

As in Gold’s findings the results of this experiment suggest that something more than the 
‘Hawthorne’ effect may be operating, namely that the complex. task may offer intrinsic rewards. 
If so, then more should be done to increase the level and value of the work carried out in the 
ATC. 

It is possible that if a financial incentive had been introduced at this point, a typical Crespi 
‘elation’ effect may have been produced (O’Connor & Claridge, 1958). 

Further experiments will be reported in which the comparative effectiveness of other methods 
of training were evaluated using this task and one even more complex. 
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Evidence for language recoding in autistic, retarded and normal children: 
A re-examination 


Christine Fyffe and Margot Prior 





In a replication and extension of earlier studies by Hermelin & O'Connor, language recoding abilities in 
autistic, retarded and normal children matched for mental age and digit span, were compared in a verbal 
recall task. Random word lists, sentences, and anomalous sentences, eight or 12 items in length (for high and 
low memory span subgroups) were presented and the number of words recalled from each type of input was 
scored. All low span children recalled sentences better than random lists with normal children superior to 
retarded and autistic children and the latter group poorer than the retarded group. Autistic children showed 
a recency effect with both types of input There were no group differences amongst high span children and 
sentences were again better recalled than random lists. In Expt II sentences were better recalled than 
anomalous sentences, with autistic and retarded children equivalent in performance and poorer than normal 
children 

Although low span autistic children were clearly deficient in recall of sentence material when compared 
with the two control groups, the effect of conditions showed that they were able to use structure to improve 
recall. Since high span autistic children did not perform differently from controls it 1s suggested that results 
from this kind of study may not be generalizable, and that claims for a specific coding deficit in autistic 
children need further substantiation. 


Although we have made little apparent progress in a search for the aetiology of childhood 
autism, research to date has provided us with some understanding of the specific handicaps 
shown by children suffermg from this disorder. In particular the ‘cognitive deficit’ theory, 
expounded by Hermelin, O’Connor, Frith, Rutter and their colleagues (e.g. see Rutter, Bartak & 
Newman, 1971) has received considerable attention and support. This theory says in effect that 
the basic handicap in autism is a cognitive one, with its most marked effects on language 
functions; that autistic children are unable to code environmental input meaningfully; to relate 
experience to a storage of ‘concepts in code’; to apply the ‘rules’ of language; to make sense of 
what they see and hear (Hermelin & O'Connor, 1970; Hermelin & Frith, 1971; Rutter et al. 1971; 
Rutter, 1974; Wing, 1975). A number of experimental studies are frequently cited in support of 
this theory. 

However despite the large quantity of research there has been a notable lack of reports of 
replication of experimental work and of any systematic evaluation of some of the basic 
assumptions of the theory. The experiments reported were undertaken to do just that in 
reference to some of the frequently cited work of Hermelin & O’Connor. A replication and 
extension of one of their major investigations was carried out along with an attempt to measure 
the effects on their conclusions of perceived flaws in their procedures. 

Contemporary views of autism stress the significance of the abnormality of language 
development which is believed to be central to the disorder (Hermelin & O'Connor, 1970; 
Churchill, 1972; Rutter, 1972.a, b; 1974; Bartak, Rutter & Cox, 1975). Language ability is thought 
to depend on the existence of a set of organizing principles (Hermelin, 1971), and it is suggested 
that it is the absence of feature extraction processes in autistic children that affects both 
language and social skills. O’Connor & Hermelin (1967) investigated the effect of sequential 
order and of recency on the immediate recall of verbal input with normal and autistic children. 
Subjects, grouped according to digit span (either three or four) were presented with wordlists for 
recall which were either six or eight items in length. The rationale behind this type of experiment 
is that in order to be able to recall verbal material which is greater in length than auditory 
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memory span, the subject will have to recode the material to remember more items, i.e. to make 
use of redundancy in the input. The lists comprised two distinct halves which were either short 
sentences or random wordlists, which, used in all possible combinations gave four types of lists. 
Recall was scored according to the number of words correctly recalled in the correct order. 
Results showed that sentences were always better recalled than random lists although this 
difference was less pronounced for the autistic group All subjects showed a marked recency 
effect which was particularly strong with the autistic group. The authors concluded that meaning 
was appreciated less by the autistic children than by the normals and that echolalia and an 
‘echobox’ type memory store contributed to the recency effect for the autistic children. 
However no retarded control group was included in this study thus leaving an important variable 
uncontrolled. 

A further Hermelin & O’Connor study (1967 a) has been widely cited as evidence that autistic 
children are deficient in their processing of ordered input. This study showed no difference 
between the recall of sentences and random wordlists by autistic children. However several 
methodological problems limit the conclusions. Although children were supposedly matched for 
digit span, autistic children recalled significantly more random material than retarded children. 
This suggests inadequate or unreliable matching. Secondly, because an entire list had to be 
accurately recalled to be scored (longer lists usually scored zero), the results were based on lists 
of only three and four words in length, i.e. well within the memory spans for most children and 
thus requiring no recoding. Thirdly the wordlists presented included questions. Menyuk & 
Looney (1972) have shown that such sentences are not as readily remembered by children with 
language difficulties as active declarative and imperative sentences. The second part of this study 
involved the recall of word lists consisting of two groups of related words interspersed 
throughout the list. Overall recall scores for the two groups did not differ although the retarded 
children showed considerably more clustering of related material. However there were no 
differences between the recall of lists containing related items and unrelated lists. This too may 
have been a consequence of the doubtful matching for digit span. No attempt was made to 
assess the effects of practice over the two parts of the experiment. 

A further criticism of the O’Connor & Hermelin conclusions is that they are overgeneralized 
since their results have not held up with groups of high span autistic children. Frith (1970 a) 
found that the recall of high span autistic and normal children did not differ when they were 
given verbal input consisting of various types of binary patterns of words. However her results 
with low span autistic children were in line with those of Hermelin & O’Connor and Frith too 
argued that this was evidence for deficiencies in feature extraction processes with this group. 
Low versus high span differences were also evident in Aurnhammer-Frith’s (1969) study which 
compared the recall of random lists and sentences by autistic and normal children. All groups 
were able to improve recall with sentences but the group that benefited least from the presence 
of structure and meaning was the low span autistic group. However some low span lists again 
were within the memory span of some subjects and this could have reduced the effect of 
structure for such children. This study also lacked a retarded control group. 

Thus there are both methodological and interpretative problems with parts of these studies 
which suggest that a re-examination is desirable. It is not clear, as Hermelin, O’Connor and Frith 
claim, that autistic children are not responsive to sequential input and respond to patterned input 
as if it were random. Nor is it clear that such an hypothesis is applicable to a range of autistic 
children of different levels of functioning despite the fact that language handicaps remain even 
when functioning in other areas improves markedly. The proposed ‘echobox’ type memory store 
which is hypothesized on the basis of recency effects in verbal recall performance also needs 
further investigation since there is evidence that recency effects are developmentally influenced 
(e.g. Deese & Kaufman, 1957; Brown & Fraser, 1964); and thus may not be a feature of autism. 

The experiments to be described were designed to test the conclusions of O’Connor and 
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Hermelin (1967), Hermelin & O’Connor (1967 a) and Frith (1970) by comparing recall of various 
types of structured and unstructured verbal input with autistic, retarded, and normal children 
matched for mental age and auditory memory span. 


Experiment I 
Method 


Subjects. A total of 20 autistic, retarded and normal children matched for mental age was tested. The 
retarded and autistic children were also matched for chronological age and intelligence level. Subjects were 
further divided according to their immediate memory span as assessed by the Auditory Sequential Memory 
Subtest (Digit span) of the Ill:mots Test of Psycholinguistic Abilities. Those with a digit span of 5 or less 
were termed the low span group; those with a span of 6 or more were termed the high span group (Tables | 
and 2). 


Table 1. Condition 1: low span groups: means and standard deviations of groups’ chronological 
age, mental age, IQ and digit span (n = 14) 


C.A. M.A. 





Digit Verbal Performance 
M SD. M SD span IQ IQ 





Yr Mth Yr Mth Yr Mth Yr Mth M S.D. M SD M S.D. 





Autistic 1 3 2 4 6 8 1 8 429 0-91 58 ë 3 66 16 
group (n= 10) 
Retarded 2 2 3 8 6 8 I 10 436 06 59 12 6 12 

group 
Normal 6 9 3 6 9 3 471 047 100 100 
group (approx ) (approx.) (approx.) 


Note: (1) C.A. =chronological age, M.A. = mental age. (2) Performance IQ measures were not available for 
four autistic children (3) IQ and M.A measures for the normal children are estimates. (4) IQ scores were 
obtained from the following tests‘ WISC, Binet, Leiter performance scale, and Peabody Picture Vocabulary 
Test. 


Table 2. Condition 2: high span groups: means and standard deviations of groups’ C.A., M.A., 
IQ and digit span (n = 6) 


C.A. M.A. 
M S.D. M S.D Digit span IQ 
Yr Mth Yr Mth Yr Mth Yr Mth M S.D. M S.D. 
Autistic group 13 3 3 6 9 I 2 4 6:83 0-79 80 15 
Retarded group 11 6 5 10 I l 6 6:33 0-57 84 9 
Normal group 9 9 3 9 9 3 6:33 0-57 100 
(approx.) (approx.) 





For all children English was therr first and only language. The diagnosis of autism was based on the Prior 
et al. (1975) taxonomuc classification and criteria for the diagnosis of autism thus were essentially the same as 
those of Rutter (1974) and those used by Hermelin & O’Connor (1970) in their experimental work. None of 
the children was institutionalized and all were attending various day centres and special schools. The normal 
children were from an Education Department Primary School and were chosen by their teachers as ‘average’ 
students. 
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Materials. The wordlists were made up of monosyllabic words from the Mein & O’Connor (1960) list of 
words commonly used by subnormals. All words used had a frequency of occurrence of equal to or greater 
than 50 per cent. Two types of wordlists were constructed: sentences and random wordlists with the same 
words from each sentence rearranged to form the random wordlist. For example: 

Sentence ~ The man went to see his horse run 

Random List — went horse the see his run man to 
There were six active declarative sentences and three imperative sentences but no questions (as ın the 
Hermelin & O’Connor study). 

For low span children wordlists were eight items long, i.e. well above immediate memory span; for the 
high span children wordlists were 12 items in length. For the latter group, wordlists were extended versions 
of the low span sentences, e.g. The nice man went up this road to see his horse run, with corresponding 
random lists. Thus 18 wordlists were presented to each child, nine of each type 


Procedure. All subjects were tested individually ın a familiar schoolroom In the first part of the session 
subjects were tested for their immediate memory span and assigned accordingly to the appropriate group. 
This test preceding the experiment ensured that the children understood the test condition as the tasks were 
similar. The subsequent presentation of wordlists was ordered according to a random number sequence with 
the proviso that related material was not adjacent. Three practice lists preceded the test lists. The 
experimenter presented the lists once each in a loud monotone, at the rate of two words per second. Recall 
was immediate and unhurried with verbal praise given for each attempt at recall. The Hermelin & O’Connor 
method of ‘absolute’ scoring was used, i e. one point for each word correctly recalled in its original position 
with a maximum possible score of 8 (12) for the low (high) span groups. Recall scores for each half of the 
wordlists were also compared to test the ‘echobox’ memory store hypothesis of Hermelin & O’Connor. For 
this comparison a simple count was made of the number of words recalled from the first and second halves 
of the lists without reference to order. 


Results 


A. Low span groups. One-way analyses of variance with repeated measures were used to 
compare recall scores for the three groups and to examine differences between recall of the two 
types of lists (see Table 3 and Fig. 1). For all groups, sentences were better recaHed than 
random lists with all groups equivalent in recall of random lists. Autistic children were poorer 
than retarded children who in turn were poorer than normal children in their recall of sentences. 
Thus although autistic children were able to improve their recall score with structured material 
they did not benefit from this structure as much as control children. This result is similar to that 
reported by Aurnhammer-Frith (1969) and that of O’Connor & Hermelin (1967) but different 
from that in the Hermelin & O’Connor (1967) study. Methodological differences may underlie 
this latter discrepancy. 


Table 3. Experiment I: Comparison of recall of sentences and random lists. Low span groups: 
One-way ANOVAs (n= 14 per group) 








F Result 
Group 
Autistic F=8-14, d.f. = 1,13, P<0-05 Random < sentence 
Retarded _ F=19-80, d.f. =1,13, P< 0-01 Random < sentence 
Normal F=319-19, d.f = 1,13, P< 0-01 Random < sentence 
Wordlist 
Sentences F=23-67, d.f. =2,39, P<0-01 Autistic < retarded 
< normal 
Random lists F=0-95, d.f =2,39, n.s. Autistic = retarded 


= normal 
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Recall score 





Random Sentences 
Type of wordlist 


Figure 1. Average recall of random and sentence material by low span groups @—®, autistic subjects; 
x—X, retarded subjects; O—O, normal subjects. 


Recall of first and second halves of wordlists: Analysis of variance using recall scores for first 
and second halves of the lists showed that all groups exhibited a significant recency effect for 
recall of random lists (Table 4). This finding is consistent with those for normal groups (e.g. 


Table 4. Low span groups: the results comparing the recall scores for the first and second halves 
of sentences and random lists (n = 14) 


Sentences Random lists 
Autistic group F=8-72, d.f.=1,13, P< 0-05 F= 15-75, df. = 1,13, P<0-01 
first half < second half first half < second half 
Retarded group F=345,d. =1,13, ns. F=7-09,d.f =1,13, P< 0-05 
first half = second half first half < second half 
Normal group F=3 71, df.=1,13, ns. F=9-28, d.f. = 1,13, P< 0-01 
first half = second half first half < second half 


Deese & Kaufman, 1957). Normal and retarded children did not differ in recall of first and 
second halves of sentences which were equivalent. However autistic children recalled 
significantly more latter items in the structured sequences. This recency effect, also reported by 
Hermelin & O’Connor (1967), is characteristic of the recall of unstructured input (Blasdell & 
Jensen, 1970) and of the recall of sentences of young normal children, aged two to three years 
(Brown & Fraser, 1964). 
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B. High span groups. Two-way analyses of variance were carried out for high span groups and 
indicated that there were no significant differences in recall between autistic, retarded, and. 
normal groups. There was however a significant effect for wordlist type with sentences 
significantly better recalled than random lists for all three groups. As with the low span group, 
structure assisted recall for these autistic children somewhat less than it did for controls. 
However it should be noted that differences were non-significant (Fig. 2). 


100 % 


Average recall score 





Random Sentences 


Type of wordlist 


Figure 2. Recall of random and sentence material by high span groups @—@®, autistic subjects; x—x, 
retarded subjects; O—O, normal subjects 


For all groups, there were no significant differences between recall scores for the first and 
second halves of sentences. With random list halves autistic children showed a significant 
primacy effect, normal children showed a significant recency effect, and retarded children 
showed no differences. Since the group sizes for high span children were very small this 
somewhat inconsistent finding may be ascribed to chance variation. Further analyses of variance 
showed that there were no differences between high and low span groups in the recall of either 
random or sentence material. 


Experiment II 

The results of the first study indicated that autistic children were able to improve their recall 
with structured material as did retarded and normal children although for low span children at 
least, recall was poorer than that of controls. Their ability to profit by syntactical and semantic 
cues might be described as poor but not absent. Furthermore high span autistic children did not 
differ from controls and therefore any deficiency was not common to all autistic children. In the 
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second study recall for another type of structured verbal input, viz. anomalous sentences, was 
compared with recall of sentences. An anomalous sentence is syntactically consistent but 
semantically meaningless’ Studies with normal children aged six and over have shown that 
sentences are recalled better than anomalous sentences due to the greater redundancy of a 
wordlist that complies with both semantic and syntactic rules (Weener, 1971; Frasure & 
Entwisle, 1973; Entwisle & Frasure, 1974). Children below this age however, did not have a 
fully developed grasp of language rules, and were less likely to differentiate between the two 
types of sentences. It was hypothesized that autistic children who lack adequate 
language-processing skills would be less likely to recognize the difference between sentences and 
anomalous sentences. They would be less likely to perceive the anomalous sentences as odd if 
they are deficient in extracting meaning, since this material is syntactically or ‘mechanically’ 
correct. Thus ıt was predicted that recall scores for these two types of material would not differ 
significantly for autistic children although retarded and normal children would recall more 
sentence material. 


Method 


Seven boys and three girls appropriately matched, from each of the original low span groups were included 
as subjects. CA, MA, IQ and Digit Span characteristics were essentially the same as in the original groups of 
14 (Table 1). Wordlists were similar to those used in the first study. Two words, either nouns and/or verbs, 
were changed from the original list to produce the anomalous sentences For example, original sentence: the 
boy and girl can dig like me; anomalous sentence’ the house and church can dig like me; New sentence 
(related to a sentence from the first study). the dog and cat can run like me There were nine sentences and 
nine anomalous sentences presented according to the same procedure as used earlier. 


Results 


One-way analyses of variance were used to analyse the data. Autistic and retarded children 
recalled the same amount of material but this was significantly less than that recalled by normal 
children (Fig. 3 and Table 5). Overall, recall scores for sentences were superior to those for 


Average recall score 





Anomalous Sentences 
Sentences 


Type of wordlist 


Figure 3. Recall of sentences and anomalous sentences @—@, autistic subjects; x—x, retarded subjects; 
O—O, normal! subjects. 
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Table 5. Experiment If: Comparison of recall of sentences and anomalous sentences One-way 
ANOVAs (n= 10 per group) 


F Result 
Group 
Autistic F=6-47, d.f.=1,9, P<0-05 Anom < sentence 
Retarded F= 19-15, d.f. = 1,9, P<0-01 Anom < sentence 
Normal F= 34-12, d.f.=1,9, P<0-01 Anom < sentence 
Wordlist 
Sentences F=9.-03, d.f. =2,27, P<0-01 Autistic = retarded 
< normal 
Anomalous sentences F= 14-44, d.f. =2,27, P< 0-01 Autistic = retarded 
< normal 


anomalous sentences. Although all groups were better with sentences the level of recall for 
autistic and retarded children was poorer than that of normal, i.e. clear semantic information was 
of greater benefit to normal children. Since autistic and retarded children performed similarly on 
this task a low IQ related deficit rather than a specific processing deficit must be considered. The 
hypothesis that autistic children would not distinguish between sentences and anomalous 
sentences was not supported: meaning clearly aided recall. 

Further comparative analysis between recall of sentences from the first and second studies 
showed that there were no apparent practice effects to bias the results. A comparison of recall 
of random lists from the first study and anomalous sentences from the second showed that all 
groups recalled anomalous sentences better than random lists. 

In summary, all groups recalled structured material better than unstructured material with 
sentences better recalled than anomalous sentences. For sentences, the recall scores of low span 
autistic children were poorer than those of retarded children and both these groups showed 
poorer performance than normal children. High span groups were equivalent in performance. 
The finding that all low span children recalled random lists equally well suggests that the 
subjects were more adequately matched for auditory memory span (with unrelated items) than 
were those of Hermelin & O’Connor. All low span groups showed a significant recency effect 
with random lists but only autistic children maintained a recency effect with sentences. Amongst 
high span groups there were no recency or primacy effects for sentences and a mixture of effects 
for random lists. 


Discussion 


It seems that two arguments may be made on the basis of these results. The first would be 
essentially similar to that of Hermelin & O’Connor (1970): that autistic children are deficient in 
their ability to comprehend language because they are unable to use structure and meaning, i.e. 
to code the verbal input in a normal way. This is reflected in their failure to make use of these 
features of verbal material in recoding for recall. Thus their recall of structured material is little 
better than that of random material in contrast to the recall performance of normal and retarded 
children. The results of this study show consistently lower levels of facilitation of recall for 
autistic children when structured material rather than random was to be recalled. Their ability to 
use semantic and syntactic cues is poor although their memory per se, as measured by recall of 
random matenial, is not handicapped. Recency effects in recall may te interpreted as supporting 
this argument. 

The second argument which can be developed from the same findings is that poorer 
performance is explicable in terms of a lower developmental level rather than a processing 
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deficit specific to autism. This argument may be supported by reference to the finding that 
structured material did facilitate recall for autistic as well as control subjects indicating that 
active reorganization of the material was common to all groups. Thus there is evidence that 
autistic children were able to appreciate structure, and disabilities ın performance relative to 
normal children could be related to a generally lower level of functioning. The complete lack of 
any differences between high span autistic and control groups further supports this contention. If 
there is a specific deficit it might be expected to appear in high span autistic children unless one 
were to argue that they represent a different diagnostic group. Since there were no differences in 
performance between high and low span groups this seems improbable at least in the context of 
this experiment. However, since the numbers in the high span groups were small the results 
should be treated with caution. The comparison of recall of sentences and anomalous sentences 
showed that all groups recalled the former better than the latter, again indicating that autistic 
children were able to use meaning to improve recall. Furthermore they were better with 
anomalous sentences than with random lists indicating the influence of syntactic structure. In 
this experiment autistic and retarded groups were equivalent in performance suggesting the 
influence of low IQ. Perhaps it is a question of a level of performance rather than a real 
difference and an area of behaviour where the influence of motivation cannot be ignored (Zigler, 
1960). 

Analysis of recency and primacy effects provided some support for the O’Connor & Hermelin 
(1967) postulation of an ‘echo-box’ type memory store in autistic children. The expected recency 
effect for random material (e.g. Blasdell & Jensen, 1970) was obtained with all low span children 
but whereas the control groups showed no word position effects with sentences the autistic 
children retained a recency effect. However this finding is not easy to reconcile with the 
evidence for recoding discussed earlier. Moreover high span autistic children showed no recency 
effects for sentences. 

We suggest that there is at present insufficient evidence from verbal recall studies to maintain 
an hypothesis of deficient feature extraction or deficient appreciation of rules in language 
amongst autistic children. Although one may feel intuitively that such an abnormality exists their 
actual deficiencies in performance may be attributable to their retarded development and not to 
deficient feature detection capacities per se. Digit span increases with developmental level 
(Aurnhammer-Frith, 1969). The effect of increased digit span and presumably developmental 
level in this study was an improvement in the performance of the autistic children relative to the 
control children to the extent of an equivalent performance level. This is comparable to 
behaviour of young normal children (Weener, 1970; Frasure & Entwisle, 1973; Entwisle and 
Frasure, 1974) who are able to recall a sentence better than a random list but the difference is 
not nearly as marked as it is 12 months later. This suggests immaturity of the system rather than 
a specific handicap. A similar developmental influence may be implicated in the recency effects 
reported. Low span autistic children showed this effect with sentences as well as random lists 
but high span children showed no such effect. Brown & Fraser (1964) reported that younger 
children showed a recency effect even with sentences in spite of evidence for verbal recoding as 
shown by retention of the correct sequential order of items. Older children do not show this 
position bias. 

Thus, the results of this investigation along with an examination of developmental findings in 
verbal recall tasks suggest that any handicaps in such tasks shown by autistic children may be 
developmental and not necessarily associated with a specific coding deficit. The results of 
Hermelin & O'Connor (1970) and Aurnhammer-Frith (1969) may be re-interpreted in 
developmental terms. Other studies may be cited which have also stressed developmental 
handicaps (e.g. Hermelin & O’Connor, 1967 b; Frith, 1970; Prior, 1975; Minz, 1976). However 
this is not to suggest that such a deficit does not exist but rather that replication of experimental 
work over a broad range of children must precede any generalizations. 


402 Christine Fyffe and Margot Prior 


References 


AURNHAMMER-FRITH, U. (1969). Emphasis and 
meaning in recall in normal and autistic children. 
Language Speech 12, 29-38. 

Bartak, L , RUTTER, M. & Cox, A. (1975) A 
comparative study of infantile autism and specific 
developmental receptive language disorder Br J 
Psychiat 126, 127-145 

BLASDELL, R. & JENSEN, P (1970) Stress and word 
position as determinants of imitation ın first 
language learners J. Speech Hearing Res. 13, 
193-202 

Brown, R. & Fraser, C (1964). The acquisition of 
syntax. In U. Bellugi & R. Brown (eds), The 
Acquisition of Language. Yellow Springs, Ohio’ 
Antioch College Press. 

CHURCHILL, D. W. (1972). The relation of infantile 
autism and early childhood schizophrema to 
developmental language disorders of childhood. J 
Autism Childhood Schizophrenia 2, 182-197. 

DegseE, J. & KAUFMAN, R. (1957). Senal effects in 
recall of unorganized and sequentially organized 
material. J. exp. Psychol. 54, 180-187 

ENTWISLE, D. & Frasurg, N. (1974). A 
contradiction resolved Children’s processing of 
syntactic cues. Dev. Psychol. 10, 852-857. 

Frasure, N & ENTwIsce, D. (1973). Semantic and 
syntactic development Dev. Psychol. 9, 236-245 

FRITH, U. (1970). Studies in pattern detection in 
normal and autistic children. I. Immediate recall 
of auditory sequences. J abnorm. Psychol. 76, 
413-420 

HERMELIN, B (1971). Rules and language. In 
Rutter, M. (ed.), Infantile Autism: Concepts. 
Characteristics and Treatment. Edinburgh: 
Churchill, 

HERMELIN, B. & FRITH, U. (1971) Psychological 
studies of childhood autism Can autistic children 
make sense of what they see and hear? J spec. 
Educ 5, 107-117 

HERMELIN, B. & O'Connor, N. (19674) 
Remembering of words by psychotic and 
subnormal children Br J. Psychol. 58, 213-218. 

HERMELIN, B. & O'Connor, N (19675). Perceptual 
and motor discrimination m psychotic and normal 
children, J. genet Psychol 110, 117—125 

HERMELIN, B. & O'Connor, N (1970) 
Psychological Experiments with Autistic Children 
Oxford: Pergamon Press 


MEIN, R & O'Connor, N (1960). A study of oral 
vocabularies of severely abnormal patients. J. 
ment, Defic. Res. 4, 130-143. 

MEYNUK, P & Looney, P. (1972). A problem of 
language disorder. Length versus structure J. 
Speech Hearing Res. 15, 264-279. 

Minz, M. (1975) The use of the Leiter international 
performance scale in the assessment of autistic 
children. Unpublished Honours Thesis, 
Department of Psychology, Monash University, 
Melbourne 

O'Connor, N & HERMELIN, B (1967). Auditory 
and visual memory in autistic and normal children. 
J. ment. Defic. Res. 11, 126-131. 

Prior, M (1975). Aspects of cognition in autism 
Unpublished Ph D. Thesis, Department of 
Psychology, Monash University, Melbourne. 

Prior, M., BOULTON, D., GAJZAGO, C. & PERRY, 
D (1975). The classification of childhood psychoses 
by numerical taxonomy. J Child Psychol. 
Psychiat. 16, 321-330. 

RUTTER, M (19724) The effects of language delay 
on development. In M. Rutter & J. Martin (eds), 
The Child with Delayed Speech. London. 
Heinemann. 

RUTTER, M. (1972.6). Childhood schizophrenia 
reconsidered. J Autism Childhood Schizophrenia 
2, 315-337, 

RUTTER, M (1974), The development of infantile 
autism. Psychol Med. 4, 147-163 

RUTTER, M , BARTAK, L. & NEWMAN, S. (1971) 
Autism — a central disorder of cognition and 
language. In M Rutter (ed.), Infantile Autism: 
Concepts, Characteristics and Treatment. 
Edinburgh Churchill. 

WEENER, P. (1971). Language structure and the free 
recall of verbal messages by children. Dev. 
Psychol 5, 237-243 

WING, L. (1975). Early Childhood Autism Oxford: 
Pergamon Press. 

ZIGLER, E. (1960), Developmental versus 
difference theories of mental retardation and the 
problem of motivation Am J. ment. Defic. 73, 
536-556 


Received 18 November 1976; revised version received 6 July 1977 


Requests for reprints should be addressed to Dr Margot Pnor, Department of Psychology, La Trobe 


University, Bundoora, Vic. 3083, Australia. 


Christine Fyffe is in Counselling, Guidance and Chnical Services, Education Department, Victoria, Australia. 


Br. J. Psychol (1978), 69, 403-412 Pnnted in Great Britain 403 


Book reviews 


Psychology Around the World. By Virginia Staudt Sexton & Henryk Misiak. Monterey, California: 
Brooks/Cole 1976. Pp. viii+470. $14.95. 


At first glance it might seem an unpromising idea to cram into one book an account of nearly everything 
going on in psychology almost everywhere in the world apart from the United States. The 31 chapters (29 of 
them dealing with individual countries, one with 15 Latin American states, and one with all Africa south of 
the Sahara) average a mere 14 pages each. Surprisingly, however, the enterprise comes off, mainly no doubt 
because of the high quality of the contributions. The book is not only informative, but often interesting and 
stimulating. Though intended primarily as medicine for the Americans, who are accused of disregarding 
‘almost completely research done in other countries and particularly in other languages’, it 1s no less useful 
to psychologists elsewhere — and in this country we, perhaps, need the medicine just about as badly as the 
Americans. 

The growth of psychological activity throughout the world over the last quarter of a century 1s impressive. 
In Latin America, for example, there are now 35 psychological journals published, and 14 national 
psychological associations. International conferences nearly always meet in advanced countries with a 
western tradition — Japan is a recent exception — so there 1s a good deal of ignorance among psychologists 
about what is going on elsewhere. The chapters on Africa (south of the Sahara), India, Iran, Latin America, 
Pakistan and in particular China are, therefore, especially interesting. Nearly all the chapters have been 
written by psychologists of standing in their own countries. A few, like Dr Pongratz of Germany, have tried 
to pack in rather too much detail, but in the main the contributors have avoided this pitfall In a few cases 
the chapters have been written by well-informed outsiders; thus, Dr Hoorweg of the Netherlands contributed 
the chapter on Africa (south of the Sahara), and Drs BroZek and Rahmani (Czech and Rumanian 
respectively) have written the excellent chapter on the Soviet Union. Brian Foss has provided a 
well-balanced and informative account of psychology in the United Kingdom. A particularly useful feature 
of the work is the information about a number of important books not yet translated into English 

I have three criticisms to make. Firstly, the book has taken too long to produce. Nearly all the articles were 
written in 1972 or 1973, so the information was three or four years out-of-date on publication. This is 
sometimes obvious to anyone who knows the countries in question. In Ireland, for example, there 1s no 
mention of the new department of psychology at University College, Galway; and in Iran no mention of the 
new university of Isfahan. Secondly, the bibliographies are most inadequate, and readers are asked to write 
to the various authors for complete lists of the books and articles cited. Thirdly, the choice of countries 1s 
not entirely beyond criticism. There were obviously good reasons for leaving out the United States 
altogether. But New Zealand, where there are six universities with sizeable psychology departments, and a 
respectable volume of psychological work, gets no mention; nor does Yugoslavia, where again there are 
several psychology departments with quite a long tradition of work. Armenia, though only a constituent 
republic of the USSR and not an independent country, gets a whole chapter Why Armenia and not Georgia 
which, psychologically, is a good deal more important? 

There is so much going on ın psychology in so many parts of the world that perhaps the time has come 
when a Year Book of Psychology would be justified, to provide information in an up-to-date form, and list 
key publications. This extremely interesting book that Sexton and Misiak have compiled suggests that the 
time is ripe for such an enterprise 
L. S. HEARNSHAW 


The Child from Five to Ten, rev. ed. By Arnold Gesell, Frances L. Ilg & Louise Bates Ames, in 
collaboration with Glenna E. Bullis. New York: Harper & Row. 1977. Pp. xv+461 $15, £7.95. 


Those who own or have read the 1946 edition of this book will find little changed in the 1977 edition. The 
authors express some surprise and pleasure about this; the reviewer is not so sure about her reactions The 
plan of the book is the same in both editions Part One 1s a bref introductory chapter on the ‘Cycle of 
development’, the ‘Growing mind’ and ‘Parent-child-teacher relationships’. Part Two is called ‘The 
Growing Child’ and gives us descriptions of development from birth up to ten years, dealing at each year 
with ten topics, ranging from ‘Motor characteristics’, ‘Personal hygiene’, ‘Emotional expression’ on to 
‘Ethical sense and philosophical outlook’. Development in the first four years is summarized, for the benefit 
of those who have not read Infant and Child in the Culture of To-day or The First Five Years of Life. In Part 
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Three, these ten topics are dealt with, from birth to ten years, to give a complete picture of motor 
development, emotional expression, and so on to philosophical outlook. Inevitably, this makes the book 
seem a little repetitive. 

The authors emphasize throughout, the cyclic nature of development, periods of change and instability, 
followed by periods of consolidation and stability, the cycles in the first six years being as short as six 
months. They give a little nod in the direction of Piaget’s processes of assimilation, accommodation and 
equilibration, but do not discuss the apparent similarities. 

They tell us a little more in 1977 about the children studied; a new sample seems to have been begun from 
1957 and followed up until 1967, with further follow-up to 16 years for the third book, Youth The children 
were of above-average intelligence, from prosperous middle-class families. There were approximately 50 
children at each age level from birth up to ten years. It 1s described as a longitudinal study, but it is not clear 
how often the same 50 children were seen. 

The authors assure us that their general descriptions of behaviour and of the fluctuating cycles of 
development are as ‘objective’ as they could make them, and based on observable and measurable 
behaviour. They maintain that tables of figures, test results, distributions, etc., are not necessary for the 
readers at whom the book is directed, 1.e. parents, teachers, doctors and others concerned with the welfare 
of children, I suspect that when I read the book in 1947 I was quite happy to accept that statement. In 1977, 
the absence of figures, information about tests, questionnaires and other ‘back-up’ data certainly worries me. 

The main changes between the two editions and the two samples of children are that they find 13 cycles or 
phases instead of 11 in the early years; children seem to be more variable than they had earlier thought 
Television has taken the place of radio as one of the pastimes for children, but they do not discuss its 
influence on children; families are smaller than they were, and fathers now take a more active part in 
bringing up the children. They do not mention the steadily increasing divorce rate between 1946 and 1977. 

They find that schools and methods of teaching have changed considerably, and this is the only topic on 
which they express some doubt about whether the changes are an improvement and helpful to the children. 

Were they so far ahead of the times in 1946 that these relatively slight changes in the descriptions and 
background were all that were needed? However, the book is lively and well written, with just enough 
examples of behaviour to make one aware of the real children behind it. Their emphasis on the cyclic pattern 
of development could be helpful to a parent or teacher worried by the apparent changeableness of a child 
between, say, the ages of five and seven years. But how I longed for occasional breaks in the bland 
descriptions of family life, and for a substantial appendix full of objective data. It is worth noting, however, 
that the bibliography has been really updated; most of the references given are from 1969 onwards. 

AGNES CRAWFORD 


Encyclopaedic Handbook of Medical Psychology. Edited by S. Krauss. London: Butterworth. 1976. Pp. 
xvii+585. £13.50. 


In many ways this is an intriguing book. The editor’s intention was to provide a work of reference which 
would appeal to both psychologists and psychiatrists. He proposed to give the latter a broad view of the 
psychological base of their discipline while behavioural scientists were to be introduced to the practice and 
theory of psychological medicine. 

In fact, this is quite a substantial task. It was approached by gathering over 200 predominantly short (two 
or three page) articles on a variety of topics, contributed by a great many authors of varying degrees of 
eminence. Their contributions were then arranged by alphabetical order of title and the result is this 
encyclopaedic handbook. It contains sections ranging from ‘Accident proneness’, through ‘Food habits’, 
‘Learning theory’, ‘Sleep’ and many other disparate topics to the final article on ‘Yogo therapy’. 

Many potential readers would expect a book with this title to place particular emphasis on concerns such 
as depression, schizophrenia, neurosis, perception, etc. Indeed, they will not be disappointed. There are 
separate articles, for example, on the nosology, symptomatology and prognosis of schizophrenia. Yet others 
deal with the family background and object relations in schizophrenic states. However, one is also pleasantly 
surprised to find contributions with a less obvious clinical bias. The sections on ‘Noise and man’ and on 
‘Reaction time’ are written from an academic, experimental psychology background. 

Does this mean then, that the book really is encyclopaedic in scope and that the intentions behind it have 
been well met? Unfortunately not. The decision to include some of the articles seems, at times, to be a little 
quixotic, It is hard to grasp, for example, how a knowledge of Rorschach Tesi findings in Japanese patients 
is of immediate relevance to the practising psychiatrist or clinical psychologist in our culture. Similarly, 
clinicians are unlikely to need details of the psychiatric aspects of space travel (p. 521) in their everyday 
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professional life. This is not to deny the importance of such matters, but rather to point to a source of 
imbalance in the book as a whole. 

But perhaps this is being a little harsh. After all, the book i is intended more as a work of reference, than as 
a manual for clinical practice. As such though, one might have expected to see separate chapters on, say, 
attitudes or ethology or non-verbal communication. Since one of the longer contributions is concerned with 
research needs in medical psychology, one would also like to have seen an accompanying chapter on 
statistics, that béte noire of so many medics and not a few psychologists. However, to say that particular 
topics are not covered well may be doing no more than to illuminate the difficulties of reviewing handbooks 
or comprehensive works of reference, in general. Doubtless each reader would wish that this topic or that 
point had been made clearer or given more prominence. It is probably unreasonable to expect the editor to 
cope with all viewpoints. 

On the whole, the lack of balance in the topics selected and the variations in quality of the several 
contributions, come across as the major weaknesses of this book. The style of the articles varies from 
almost pure sociological speculation to the hard prose of a paper submitted for publication in a respected 
journal. To some extent this variation may have happened because the editorial duties were taken over by 
six consultants after the untimely death of Dr Krauss in 1973. Nevertheless, one feels that the editorial 
board should have insisted that each contributor provide a sufficient number of references for the interested 
reader. Twenty-eight of the chapters give no such list — a serious omission in a book with such laudable 
aims. 

MICHAEL HERBERT 


Humanistic Psychology: New Frontiers. Edited by Dorothy D. Nevill. New York: Gardner Press. 1977. Pp. 
xii+230. £13.00. 


The 12 contributors to this book of essays are declared evangelists for humanistic psychology. They seek to 
show that humanistic psychology sings a song of social significance, that it is proving a rich and varied 
source of insight into interaction between groups and individuals and that it is developing a worthwhile body 
of research findings and methods. 

The book is the usual edited curate’s egg. A cogent contribution by Burgental argues for a self-treatment 
of depression which is nicely the opposite of the traditional ‘pulling up of one’s socks’. He sets out a good 
case for not seeking distraction fromi depression, not getting rid of it as soon as possible, not taking refuge in 
the Protestant ethic and so forth. Simpson thoughtfully analyses the tendency of humanistic psychologists to 
deify ‘self’ to a point where the condition of others becomes a matter of scant concern. 

There are disappointing contributions. Car] Rogers analyses a dramatic event from one of his groups and 
suggests that it is ‘rich in theoretical implications’. It may so be but it is not rich in theoretical inferences 
effectively drawn by Carl Rogers. 

The volume, as a whole, is unsatisfactory, yet it is difficult to specify exactly why. True, there are the 
flaws that can commonly be found in works on humanistic psychology. There is the tendency to equate the 
particular values of American culture with fundamental human values, e.g. self-actualization often means no 
more than achieving that capacity to socialize cheerfully that is much admired in the States. The writers 
congratulate themselves too often on having broken free of behaviourism — that is not much of a trick and of 
itself it merely puts the psychologist at square one. 

Perhaps what is most discomfiting about this book 1s that it inadvertently reveals that humanistic 
psychologists have yet to produce something singular in terms of psychological theory. Their ruminations on 
the explorations of the nature of man are, at best, as good as those of good philosophers, good novelists or 
good social anthropologists. They are only impressive as psychologists, against the barren background of 
conventional psychology. At their peak they may take us as far forward as William James but no farther 
than that. 

Incidentally, the price of this book is fearsomely interesting in that it works out at around £5.60 per 
hundred pages, compared with a market average of around £3 per hundred pages. 

D. BANNISTER 


The Social Psychology of Bargaining. By I. E. Morley & G. M. Stephenson. London: Allen & Unwin 1977. 
Pp. 317. £11.50. 

Pitched at an advanced undergraduate or postgraduate level, this book addresses itself to the social 

psychological factors which influence the process of bargaining, defined by the authors as ‘negotiation for 

agreement’. It focuses upon collective bargaining between representatives of groups, the theme 
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(following Ann Douglas’s work) being that negotiations can be characterized according to change in the 
balance between the interpersonal and interparty forces involved. Morley & Stephenson voice 
dissatisfaction with a literature dominated by laboratory games which make assumptions about the players’ 
motivations, and which have with little justification been described as bargaining tasks. Ther dissatisfaction 
leads them to attempt, ‘.. .to consider in some detail the logic which guides experimental research’. A 
worthwhile activity in any area of psychology! 

The book may be divided roughly into two halves. The first half concerns itself with the problems involved 
in laboratory simulations of real bargaining situations, and with a critical and fairly exhaustive review of 
laboratory studies of bargaining. This part of the book I found extremely lucid and well argued. The 
classification of types of experiments applied by the authors is logical and adds clarity to the analysis they 
provide. Such an undertaking has been long overdue and I recommend anyone interested ın the experimental 
aspects of this area to read this book. Cutting through the dross the authors sadly, though perhaps not 
surprisingly, conclude that at the moment, *. . . there is a great deal of speculation about negotiation and very 
little evidence’, On the way to this conclusion, they may, however, have lit a match in the tunnel. 

The second half of the book provides a summary of about nine years’ work in the field by the authors. It 
presents the results of their own research programme (some of which is published elsewhere), including 
analyses of both laboratory and real-life negotiations. As the opportunities are severely limited (at least in 
the UK) for carrying out manipulative experimental work in real situations, ‘scenarios’ abstracted from 
real-life negotiations are used in laboratory simulations. The method of analysis used is a system of 
categories devised by Morley & Stephenson to describe the process of negotiation and labelled by them 
‘Conference Process Analysis’ (CPA). The system is not unlike Bales’s Interaction Process Analysis (IPA) 
which has held sway for nearly three decades. 

As with Bales’s system, CPA uses essentially a ‘psychological’ definition of a basic act or unit of 
classification. An act in CPA is seen as being a ‘.. . psychological unit which conveys a point, proposition or 
single thought.. °. A distinction central to CPA, is between the ‘function’ of the information being 
exchanged and the ‘way’ in which that information is made salient in the exchange. The authors consider 
that Bales’s groupings disguise this distinction. Their category system, as with Longabaugh’s ‘Resource 
Process Analysis’ (RPA), views social interaction as involving an exchange of resources. In RPA all acts are 
coded in terms of a ‘mode’ dimension (indicating how information is being exchanged) and a ‘resource’ 
dimension (indicating what sort of information 1s being exchanged). In addition to these dimensions CPA also 
has a ‘referent’ dimension which is used to indicate who is being described in the information being 
exchanged. CPA is described by the authors as, ‘.. .a form of high inference coding which requires very 
considerable training of the observer and very considerable effort on his part. (A 35 min transcript often 
takes a full day to code)’! Hard work never hurt anyone, but whether one would be better employed doing 
something else remains to be seen. Certainly with the effort involved, social psychological research would 
suggest that one might become rather more committed to the system than may be justifiable! 

In terms of reliability, consistency of a single trained observer is satisfactorily high for the three 
dimensions, mode, resource and referent. For the consistency across two trained observers, the ranking for 
reliability was in the order, mode > referent > resource, with mode and referent being satisfactory but with 
resource being rather low 

Turning to the use of CPA as a tool for the analysis of phases in negotiation, I will cite as an example one 
analysis of a real-life situation, in this case an electricians’ informal negotiation. Of the 24 categories tested 
for change across phases, only four were statistically significant at the 0-05 level or beyond, and using 
multiple comparisons one of these may be significant by chance alone. This 1s not to say that nothing 
interesting emerges from the analyses, more to emphasize that the changes appear rather subtle and are 
difficult to detect. Certainly the authors provide some support for Douglas’s ideas. 

One criticism or complaint I have to make about systems which use only an ‘act’ count, is that there is 
much evidence to suggest that participants’ subjective judgements of what is going on may bear little 
relation to the ‘reality’ being measured. For example, subjective judgements of ‘talkativeness’ appear to be 
more useful predictors of various social perceptions (such as leadership, popularity, etc.) than objective 
tallies of emitted utterance frequency. In studying interpersonal interaction, what people are perceived by 
others as doing may be more important than what they actually do. In the case of the CPA analyses one half 
or more of the categories in the referent dimension are unclassified, that 1s, the coder cannot be sure who, if 
anyone, is referred to in the act. If as a participant, however, you perceive this ‘unclassified’ act as being 
directed toward you, then it may well influence your attitude and behaviour. 

In conclusion, if as the authors believe, ‘these rules have a generality which extends far beyond that of 
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negotiation groups’, then clearly, for CPA the sky is the limit; personally I have my doubts. Nevertheless, 
this is a book that should be read and inwardly digested by anyone even vaguely interested in the area of 
bargaining. 


MARTIN SHERIDAN 


New Perspectives in Personal Construct Theory. Edited by D. Bannister. London: Academic Press. 1977. Pp 
x+355. £10.80. 


No programme as sweeping and ambitious as George Kelly’s (as Banmister describes ıt in this book: one 
which ‘attempts to deal with all aspects of human experience’) could reasonably be expected to be written 
down with complete logical consistency and free of ambiguities As Kelly’s work is indeed no exception in 
this respect, what he ‘really meant’ can become a matter for dispute between intellectual opponents who 
both draw on Kelly’s own words as evidence for their position (witness, for example, the sporadic 
controversy carried out in recent years in the British Journal of Medical Psychology between Graham 
Founds and Mildred McCoy). However, what Kelly really meant ıs in some ways less important than the 
uses which have been made of his work, and it is in these perhaps that one often detects an over-confidence 
in people’s capacity for self-awareness and at least potentially accurate self-description. Kelly’s supporters 
have always (not without justification) reacted angrily to the criticism that his psychology 1s ‘too cognitive’, 
but not until the publication of this book, perhaps, have they spelt out a convincing answer to this charge. 
Indeed, there is a distinct move on the part of some contributors to this volume (particularly Radley and 
Mair) to reject entirely the notion of personal constructs as in any sense objectifiable dimensions, or verbal 
labels, and there is less reliance than usual simply on ‘implicit’ or ‘submerged’ constructs or poles of 
constructs as explanations of ‘unconscious’ construing. Instead, Mair, for example, characterizes constructs 
as ‘guises and forms through and in which the person can particrpate actively in experiencing and exploring 
his world. ..’, and in this way ‘conceptual labels’ become ‘procedures’, Likewise, Radley rejects an 
approach to behaviour (surely often inspired even if not supported by Kelly) as action consequent upon 
reflection, seeing constructs rather as lived anticipations. 

These chapters, together with those of Bannister, McCoy, and a hitherto unpublished essay by Kelly 
himself, form the central theoretical thrust of the book, and leave one in little doubt that if personal 
construct psychology had been nosing around the entrance to a ‘cognitive’ blind alley, the danger of its 
entering it is now significantly lessened. Kelly’s treatment of emotions, found wanting by several of his 
critics, is, for instance, defended and extended by both Bannister and McCoy. 

Some other chapters in the book (there are in all 13 contributions) show perhaps less awareness of such 
theoretical issues, and the problematic relations between action and reflection are thus not always clearly 
dealt with. Nevertheless, they exemplify an interesting and thoughtful assortment of ways in which personal 
construct psychology can be used, both technically and conceptually, in a wide variety of settings and for a 
variety of purposes. These range from attempting to understand the shared social constructions of a rather 
weird mystical community centred round a rock musician in New England (Karst & Groutt), through an 
exposure of the ways in which a detention centre for young men achieves more or less the opposite of its 
rehabilitative aims (Margaret Norris), to a consideration of what personal construct psychology can do for 
architectural design and participatory planning, and vice versa (Peter Stringer). Clinicians may be particularly 
interested in Finn Tschudi’s chapter, which is a thicket of ideas about the ways in which the personal 
construct approach can illuminate the meaning of symptoms and lend coherence to the views of a range of 
therapeutic schools. 

As a sequel to the earlier Perspectives in Personal Construct Theory, this is a particularly successful book, 
and in some ways perhaps more theoretically incisive and practically instructive than its predecessor. It will 
of course be required reading for personal construct people and for those who need to keep up with 
developments in this field whether or not they are sympathizers (this should include most social, educational 
and clinical psychologists), for this is a book which refiects progress rather than merely reiterating known 
positions. It will bore and irritate those ‘accumulative fragmentalists’ who are happy to acquiesce in the 
visions offered them by traditional mechanistic psychology, but even they will have to read Kelly’s chapter 
for a succinct explanation of this pejorative label. 

D. J. SMAIL 
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The Measurement of Intrapersonal Space by Grid Technique, vol. 2: Dimensions of Intrapersonal Space. Edited 
by P. Slater. London. Wiley. 1977. Pp. 270. £12.00 


This is the second of Slater’s two volumes of grid methodology and is concerned specifically with 
measurement The first volume gave examples of applications of the technique. Leaving aside the feeling of 
the cart being presented before the horse, there is the added puzzle of discovering for whom the book is 
written. Slater gives us no guidance on this but instead presents an interesting account of how he, 
personally, became interested in this form of assessment, and how first the Medical Research Council and 
then St George’s Hospital Medical School gave him support. This 1s, in fact, basically a book on Slater’s 
own work and describes how he first developed a particular method for analysing a grid matrix based on 
principal components analysis and its extension to the analysis of pairs and then groups of grids. 

The first three chapters are concerned with intrapersonal space and grid methodology from an historical 
and a practical perspective If you are looking for answers to the ‘how do I do it’ question, you will not find 
it in this book. Indeed, the reader needs to possess a fair idea about the nature of the methodology before 
deriving benefit from this book. Slater provides a clear, if short, account of the bare bones of personal 
construct theory but, occasionally, goes off at a personal tangent. For instance, he claims that Kelly was not 
‘quite explicit or cogent’ in explaining the need for the dichotomy corollary and therefore advises that 
‘constructs which only offer a dichotomous contrast are best avoided’. Slater is neither explicit nor cogent in 
the reasoning which leads him to make this very misleading statement Similarly, he argues against the 
theoretical proposition that constructs are organized hierarchically. He cites no evidence to support this 
statement, nor the existing evidence indicating that the contrary may be the case He goes on to make the 
surprising statement that ‘when a wider variety of elements is introduced and a more sensitive scale is 
applied, any evidence of superordinacy/subordinacy is likely to be replaced by evidence of 
convergence/divergence’. Since few would deny that grids can be used without any recourse to construct 
theory, it is a pity that Slater deals with the issue by suggesting that supposed inadequacies in the theory 
dictate caution in its use. 

The second part of the book deals with analysis. Thirty pages are devoted to matrix algebra, coordinate 
geometry and principal component analysis. This is followed by detailed descriptions of Slater’s own method 
of grid analysis as applied to individual grids and grid comparisons. Part 3 contains chapters by other authors 
on the specific topics of structural measures, generalized personal questionnaire techniques and the analysis 
of three-way grids. J would hazard a guess that a glance at the mathematics in the latter chapter will 
discourage all but the most intrepid psychologist. The last part of the book takes the form of an epilogue. 
Here Slater discusses the distinction between deterministic and teleological theories citing psychoanalysis, 
behaviourism and trait psychologies as examples of the former and construct theory of the latter. After 
arguing that such dichotomies are not as simple as they appear, he advocates that psychologists should not 
commit themselves to any one theory because this imposes limitations on one’s freedom to change one’s 
mind Instead, Slater advocates ‘empirical eclecticism’. This will leave us free to use any language we 
desire. 

In both volumes, but particularly this one, Slater has given good evidence in favour of his argument. By 
devoting himself to one particular form of analysis, he has limited his freedom. Hence, the two books 
present only some of the existing grid methods and only one of the very many methods of grid analysis This 
book is thus invaluable to anyone who has used or wishes to use Slater’s computer programs as it provides 
the most explicit description of the method currently available. For the many other users of grid 
methodology, the book is a disappointment. They will no doubt wish that Slater had not become totally 
committed thereby denying them the chance of being empirical eclectics. This book might justifiably be 
renamed ‘Dimensions of Slater’s Intrapersonal Space’. 

FAY FRANSELLA 


Dimensions of Organizational Behaviour. By Theodore T. Herbert. New York: Collier Macmillan. 1977. Pp. 
xlii+ 530. £5.95. 


Another textbook of behavioural science applied to organizations. The importance of the text is claimed by 
the author to be that a study of human behaviour is a final requirement for understanding organizations and 
prescribing their effectiveness; such effectiveness being seen to include financial status, productivity, social 
responsibility and the utilization of human resources. From this proposition Herbert moves to a lengthy 
treatment of the usual business school material under categones of technology and structure, the dividual 
as a system, the social system, and processes referred to as ‘modification and integration’. Despite a 
somewhat eclectic approach it can reasonably be said that most of the empirical and argumentative material 


Book reviews 409 


origmates from so-called organizational psychology. The basic perspective is that of socio-technical systems 
theory (though there is very little mention of major socio-technical studies) integrated with behavioural 
systems theory (though there is no mention at all of what has come to be known as ‘the Aston approach’). 
The moral impetus is neo-human relations. 

Though the book contains a considerable amount of material, and could possibly be used as a source 
book, its major drawback is the way in which material is selected, interpreted, and used. Technology may 
determine behaviour ın organizations but since this may mean that personal needs are paid insufficient 
attention a certain degree of deliberate organizational change is necessary. To deal adequately with this 
problem ıt ıs necessary to distinguish between what is and what ought to be the case. But one is never clear 
whether the writer is attempting to describe what actually happens in organizations or what ought to happen 
if they are to function adequately, efficiently, humanely, etc. That is, no clear distinction is made between 
determining factors and factors which may be used to evaluate efficiency. One example of this lack of 
distinction 1s the frequent tendency to refer to problems studied by the ‘researcher’ ~ often a consultant -as 
organizational problems or problems of working life, when in fact they are typically problems faced by those 
in positions of authority, or those in conflict with authority 

This implicit evaluative emphasis on efficiency may also explain the dominant emphasis on technology - 
Herbert seeing organizations as designed around technology - and by inference on economic organizations. 
But it would be difficult to extend that emphasis to all organizations and yet the book appears to utilize quite 
general scientific-theoretic perspectives from a number of disciplines. This general perspective on what is 
actually a particular phenomenon leads inevitably to overgeneralization. It is argued that organizations 
should provide for moral attachment to work — the neo-human relations emphasts in the book — but it is not 
considered whether such attachment would be appropriate or effective in all work settings, nor even whether 
it could be achieved in work settings that are similar. 

Though the socio-technical perspective of the book can take account of the relationship between the 
formal and the informal, and the intra- and extra-organizational, its ultimate weakness lies in the inadequacy 
of regarding social institutions and their participants as similar to biological organisms Can the interpersonal 
behaviour of organizational members really be understood in terms of responses identified by the researcher 
or consultant but not necessarily by the members? 

Through the use of selected research and argument the author does manage to suggest means by which 
organizations can be more successful, or adaptive, or humane, but this selectivity plus the systems 
perspective provides very little in the way of an explanation of the causes of inefficiency, or 
non-adaptiveness, or inhumanity. Festinger may be brought in to give instruction on how to change workers’ 
attitudes, but dissonance theory does not explain why they held those initial troublesome attitudes. 
Incidentally, no mention is made of Brehm's theorizing on reactions to loss of freedom, nor is any attention 
paid to the substantial insights offered by attribution research to questions of worker—supervisor 
relationships, nor the contribution of social comparison theory to a psychological understanding of wage 
bargaining and intergroup relations. Using this book one may be able to say how but not why. 

The essential problem facing any author of this type of work is that if one avoids a structure for analysis 
then one is faced with a mass of research. One has to select but one will then be accused of bias plus lack of 
coherence. Use a structure for analysis which enables one to be selective and one will be accused of bias 
plus a failure to take into account all points of view. 

JOHN LOCKWOOD 


Biofeedback and Behavior. NATO Conference Series, III: Human Factors, vol. 2. Edited by J. Beatty & H. 
Legewie. 1977. New York Pp. x+531. $37.50. 


This excellent, albeit expensive, volume containing the proceedings of a symposium held in Munich, 1976, is 
a valuable addition to the NATO Conference Series. It is outstanding for the quality and originality exhibited 
by the 33 chapters it comprises, though this seems less remarkable when one views the impressive list of 
contributors who are amongst the most innovative researchers within the biofeedback discipline. 

The term biofeedback refers to a set of procedures for enabling individuals to control some specified 
physiological process through the provision of an external monitor or cue to indicate the present state of that 
process. This volume vividly reflects the diverse areas of study, both experimental and clinical, that are 
unified by their common adoption of such procedures. Indeed that diversity has clearly presented some 
problems to the editors who have attempted to structure the volume by dividing the contributions between 
five sections. The introductory section contains a collection of chapters that provide some alternative 
perspectives upon the issues contained within the biofeedback discipline and illustrates the biofeedback 
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method as a key to a Pandora’s Box of behavioural and physiological phenomena. The following three 
sections contain chapters devoted, for the most part, to specific and discrete methodological and theoretical 
questions and are differentiated according to the response class to which those questions are directed. 
Section Two is concerned with central nervous system events and critically examines the operant 
modification of brain wave activities in particular the alpha, theta, and sensorimotor rhythms and questions 
whether learned control of electroencephalographic activity has been demonstrated. This section also 
interprets the behavioural correlates of learned brain wave activities and debates their therapeutic utility. 
The third section is as varied as it is large. It addresses diverse issues connected only by their involving 
functions of the autonomic nervous system Two dominant themes, each the focus of a number of chapters, 
are operant modification of cardiovascular responses and the treatment of cardiovascular disorders, and 
theoretical conceptions of the nature of visceral perception and learning. Section Four examines the 
application of biofeedback procedures to responses of the skeletal musculature in both normal and disabled 
individuals. The final section contains two chapters, one from each of the editors, which summarize the 
proceedings and provide a useful overview of the general questions raised by the contributors and the types 
of answers being sought. Legewie evaluates the therapeutic potential of biofeedback procedures and 
cautiously recommends their use within a comprehensive treatment approach to psychosomatic illness. 
Beatty’s discussion encompasses the use of biofeedback as a research instrument and emphasizes in 
particular its importance for answering the vital question of whether learned contro! of physiological 
processes can modify behaviour. 

Despite segregating the chapters according to response class, however, the volume remains a compilation 
of individual contributions of differing style, content and intent which do not easily fit together, and though 
in no sense a reference book it is an effective source book to be used judiciously. That being so the 
provision of comprehensive name and subject indices is appropriate and has been much appreciated by this 
reader. It is not the intention of this volume, either in individual chapters or its entirety, to review the 
progress of biofeedback research or to present a synthesis of its accepted truths, rather it is an up to the 
moment statement of current developments and future directions of biofeedback. The audience to which the 
book will appeal will doubtless be, for the most part, specialists interested in specific issues in biofeedback 
though it could be read with great profit by psychophystologists end all others who wish to assess the present 
‘state of the art’. 

The objectives of the symposium as stated by the editors were to provide an overview of current research 
and to assess the clinical and experimental utility of the biofeedback method. The first objective has been 
satisfactorily achieved though since the choice of contributions is, of necessity, selective some will doubtless 
decry the errors of omission though few could dispute the merit of those included. My one reservation 
concerns the volume’s preoccupation with studies using human subjects and the consequent neglect of some 
important research areas employing infrahuman subjects. Equally the second objective has been met by the 
critical examination of the available evidence for the utility of biofeedback methods and the consensus is 
that they have more limited applicability than originally thought. I found this volume especially welcome, 
however, for a less circumscribed reason. The rigorous experimental and conceptual approaches contained in 
the individual contributions together with the critical apprectation of the problems to be faced in biofeedback 
research will do much to redress the balance of the biofeedback literature which has become burdened 
during its brief history with premature and overzealous claims fcr biofeedback as a panacea. Whilst further 
research must be the arbiter of the fruitfulness of the individual projects presented, the impact of the volume 
as a whole is its significance as an indicator of the coming of age of the biofeedback discipline and the more 
responsible attitude that new-found maturity implies. 

KEITH PHILLIPS 


Experimental Methodology. By Larry B. Christensen. Boston, Mass : Allyn & Bacon. 1977. Pp. x+372. £6.20. 


In an age characterized by the blurring of distinctions for political or professional gain, it was perhaps 
inevitable that the distinction between scientific and non-scientific research would become sufficiently blurred 
to encourage a wave of textbooks which would treat all research procedures in psychology as equally valid. 
This book is a clear harbinger of such a wave which may well become a flood 

Uneritical acceptance of procedures 1s made easier by the author’s deliberate attempt to eliminate from 
consideration all statistical concepts and methods, despite the intimacy of their relationship with research 
methods. What he fails to exclude, he relegates to a very short appendix. The result is a pedestrian 
presentation of current procedures and their conventional wisdom with little or no indication of main areas 
of controversy. 

Within this expository straight-jacket, the book’s main contritutions to methodology are its concentration 
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on the procedural aspects of the design and conduct of research, its inclusion of quasi-experimental and 
single-subject designs and a glossary Although claiming that the book 1s intended to help the student gain an 
understanding of what science is, as well as of what research psychologists do, only the first two short 
chapters are concerned with the nature and objectives of science and experimental method and no attempt 1s 
made to show whether and to what extent the psychologist’s endeavours are scientific. The next four 
chapters deal with the formulation of hypotheses, problems of choosing independent and dependent 
variables, problems of achieving constancy and of control of extraneous variation, including randomization, 
matching and counterbalancing procedures as well as those for controlling subject bias and expenmenter 
effects. 

Two chapters are devoted to designs, one covering general designs, and other covering quasi-experimental 
and single-subject designs. A chapter purporting to deal with data collection and hypothesis testing but 
which, after quickly skating over the logic of tests of statistical significance, deals only with choosing, 
briefing and debriefing subjects, is followed by a chapter on validity. Disappointingly, however, it 1s not 
devoted to discussion of the validity of the designs and approaches themselves but only to description of the 
distinction between internal and external validity of experiments ın terms of isolating effects and generalizing 
to populations. 

A chapter on the ethics of current procedures, especially those involving deception, 1s equally 
disappointing. It is restricted largely to enumeration and comment upon the ten principles of the new ethical 
code adopted by the American Psychological Association in 1973 together with a review of subsequent 
attempts to measure their effects Although some of the difficulties of carrying them out are discussed, the 
principles themselves are not questioned despite the fact that they are typical of the attempts to reduce 
dissonance, already referred to, by blurring the meanings of words. Thus, for example, Principle 4 insists 
that, because openness and honesty are essential characteristics of the relationship between investigator and 
participant, they should be restored after the experiment if the requirements of the study have necessitated 
deception. Obviously, openness and honesty can hardly be regarded as essential if circumstances can 
necessitate their abandonment, even temporarily. Clearly, ‘essential’ here means ‘desirable’ and 
‘necessitate’ means ‘if the investigator deems it necessary’ so that, despite its claim to do otherwise, the 
principle merely asserts that deception is justified if the investigator thinks it is. Considerations of this kind, 
however, do not disturb the book's smooth flow of liturgical exposition. 

A final chapter on the preparation of research reports is followed by the statistical appendix, glossary and 
bibliography. The statistical appendix is inadequate, consisting only of an introduction to types of scales, 
simple descriptive statistics and error variance The bibliography is good but the glossary is poor, reflecting 
very clearly the concreteness manifested through the book. Consequently, most of the definitions are 
inadequate; for example, hypothetical constructs become concepts not directly observed — as though there 
are some concepts which are — intervening variables are reified into events, deductive reasoning becomes 
concerned with observations rather than propositions, hypotheses become predictions’ and so on. The - 
blurring of the distinction between scientific and non-scientific pursuits, manifest throughout the book, is also 
crystallized here in that the only perceived difference between hypotheses and scientific hypotheses is that, 
whereas the former are predictions, the latter are statements of predictions. 

Despite its faults, however, the book offers a useful descriptive review of current research procedures and 
of the conventional wisdom on which they are based and, at this level, it can be recommended as a likely 
front runner in this new field. But, for this reviewer, ıt remains an unwitting indictment of the dangerously 
loose conventions which have come to dominate psychological research; if scientists do not give first 
priority, as they once did, to honesty, objectivity and freedom from conventional prejudices, who else will? 
A B ROYSE 


Mind Reach: Scientists Look at Psychic Ability. By Russell Targ & Harold Puthoff. London: Jonathan Cape. 
1977. Pp. xxv+230. £4.95. 


Targ & Puthoff are two physicists from the Stanford Research Institute (SRI) in Menlo Park, California. 
They are probably most well known for their work on Un Geller, some of which was published in Nature. 
The book is written in a popular and lively style, and mainly concerns their exciting discovery of a 
repeatable paranormal phenomenon which they term ‘remote viewing’. Typically the subject is locked in a 
cubicle and a target demarcation team choses at random a set of travelling orders. They then depart for the 
target, such as a local landmark. After a fixed period the subject describes in detail the target area the team 
is visiting. The authors insist that the subject is able to do this very accurately, in spite of the use of 
double-blind procedures which eliminate all possibility of collusion. The accuracy of the details reported by 
the subject is checked by a number of independent judges who are required to match blindly the subject's 
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transcripts of a number of target locations with the appropriate location Using this procedure one of their 
best subjects, ex-police commissioner Pat Price, was able to identify details with an accuracy which they 
claim was statistically significant at odds of 35000:1. They report that this kind of remote viewing may be 
available to most people, and that ‘so far, we cannot identify a single individual who has not succeeded in a 
remote viewing task to his own satisfaction’ (p. 90). 

In another series of experiments they found that Hella Hammid, a photographer, was able to accurately 
describe a location 30 min before that location had been selected from a stack of sealed envelopes by a 
random number generator. Again they insist that there was no possibility of collusion, and the strike success 
was later verified in blind judging without error. 

As if this were not enough, it may not even be necessary for a team to visit the target area. In their 
laboratory, New York artist Ingo Swann was apparently able to describe details of remote targets with no 
more information than the geographic latitude and longitude of each target. In one instance he was able to 
describe in detail ‘beyond what would ever be shown on any map’ features of a remote island 3000 miles 
away. Their best subjects were also able to display psychokinetic abilities; for example Price could influence 
the output of a magnetometer 4 m away in an adjoining laboratory. 

These remarkable findings almost overshadow their work with Geller which they go on to mention. The 
authors seem to be in no doubt that Geller possesses telepathic abilities, though they admit they experienced 
some difficulty in demonstrating his psychokinetic abilities; the main problem appeared to be the frequent 
and inexplicable sabotage of the recording apparatus. They appear unimpressed by the claims of the 
‘Amazing Randi’ to replicate Geller’s feats using conjuring tricks, or by the New Scientist's criticisms of 
their research, though they go into very little detail on these rather crucial problems. They also omut to 
mention Berendt’s report that Uri Geller was actually a stage magician in Israel. Presumably they felt this 
was of little relevance in view of Geller’s laboratory performances at SRI. 

Some of the experiments are certainly difficult to fault as they are presented, and the authors repeatedly 
remind us that they are experienced scientists whose concern for all possible sources of artifact verges on 
paranoia. As an interesting twist they include a chapter on the psychology of the sceptic, concluding that it 
is he who is ‘short on rigorous observation and long on theory’ (p. 178). 

Having accepted this position the authors sympathetically consider other phenomena such as astral 
projection, and point out the peaceful uses of psychic energy. For instance, they say a group of investigators 
have shown under ‘rigorous conditions’ a significant correlation between business profits and the ability of 
executives to score above chance in a precognition experiment. They also conclude that ESP has been 
shown to be effective in medical diagnosis, and possibly most intriguing of all, we can perhaps save on space 
research as Ingo Swann claims to have been able to describe details of the surface of Jupiter from his 
armchair. 

Whilst it would be easy for sceptics to dismiss the anecdotal evidence, some of the main experiments 
appear fairly impressive in their use of controls and one seems to be left in the position of either accepting 
the validity of some of these phenomena or having doubts about the integrity of certain participants in the 
research. If the authors’ statements are accurate, then the usually suggested artifacts such as unintentional 
experimenter bias and non-verbal cues do seem to be ruled out. 

Whilst devotees of the paranormal will probably find the book fascinating reading and herald it as one of 
the definitive works on the subject, I feel that those who are undecided about the existence of ESP will find 
much of it a little too far-fetched to be convincing, and those who oppose the concept would probably be 
more entertained reading a good science fiction novel. 

Incidentally, the remote viewing experiment did not work for us, but on the other hand, the print on page 
154 of my copy was badly smudged, and that was the chapter on Uri Geller! 

G. WAGSTAFF 
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Theories of the mechanisms controlling food Intake In animals and man have only recently become sufficiently 
systematic and complete for it to be possible to construct quantitative or logically precise computer simulations 
of hunger. This book provides a comprehensive review of current hunger models which emphasises the value 
of modelling as a technique in theory construction. The contributors are mainly experimentallsts who believe 
that theorising based on experimental analysis of normal and disordered feeding control systems should be 
disciplined by attempts at computable theoretical synthesis. 
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construct psychological approach has been used in a varlety of settings. The range of topics is Wide. some 
emphasising theory, some applications and others developments in grid technique. Papers selected for pre- 
sentation reflect that: readers would not have easy access to related work of the authors, topics areas should 
be as varled as possible, there should be an emphasis on research work. 
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In the last twenty years, repertory grid technique has been found so useful by people in a varlety of disciplines, 
that many different forms, and a multiplicity of measures have been developed. This manual describes a variety 
of commonly used grid formats and measures derived from them. The authors focus on the many difficulties 
involved in grid design and administration and the assumptions underlying grid method that must be observed 
if meaningful results are to be obtained. In so doing they dwell at some length on the notions of rellabillty and 
validity since the questions one is asking in the context of a repertory grid differ from those asked when using 
traditional psychological ‘tests’. 
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This book Is the successor to a collection of essays, published in 1970, under the title ‘Perspectives In Personal 
Construct Theory’. A varied range of psychologists contribute very lively essays on topics of their personal 
choice, treating them In the light of personal construct theory. The volume contains an hitherto unpublished 
essay by George Kelly — ‘The Psychology of the Unknown’, It comprises thirteen other contributions, ranging 
across subject areas such as child~mother Interaction, the use of grids with children, construct theory views of 
self, a study of religious hippie community In construct theory terms, uses for the theory In personal life, 
Kelly’s ideas about ‘emotion’, the practical Implications of the theory for psychotherapy, a grid study of 
adolescents in a detention centre and the tmplications of the theory for participatory democracy. 
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The neuropsychology of anxiety* 


Jeffrey A. Gray 


‘Ours is the Age of Anxiety’: that seemed a suitably portentous sentence with which to open 
this Myers Lecture. But then I realized that I was out of date. The first quarter of the 20th 
century might have deserved this epithet; but one has to be badly out of touch with reality to 
doubt that we are now well into the Age of Psychosis. Alas, it is too late to change the course of 
my research and join the modern world. So I shall stick to my last, even as you mutter: ‘Ah, 
Oxford, home of lost causes, privileged haven of anxiety in a psychotic world!’ And there is 
worse to come. Here you will find no trend-setting curtsy towards humanistic — or even 

cognitive — psychology, no dirge over the body of behaviourism; but rather the old-fashioned 
belief that psychologists study behaviour, and that this is a product of the brain. 

‘That is all very well’, I can hear you object, ‘but isn’t that the same blank cheque that 
Sechenov offered us over a century ago? And shouldn’t it have been cashed by now? And isn’t it 
audacious to attempt to cash it in the name of anxiety, hallowed to the memory of Freud and 
Kierkegaard?’ Well, yes, it is the same cheque that Sechenov drafted. But experimental physics 
did not give up a century after Galileo set it moving: why should the problems of experimental 
psychology be noticeably easier to solve? And, as I hope to show in this lecture, progress has 
been made - even in dealing with such apparently nebulous concepts as anxiety. 

My own approach to this problem takes as its major premisses: (1) that one can describe the 
psychological state which constitutes anxiety by studying the behavioural effects of drugs which 
reduce anxiety; and (2) that, by studying the physiological route by which the anti-anxiety drugs 
produce their behavioural effects, one can discover the physiological substrate of anxiety. These 
premisses can be taken in a stronger and a weaker form. In their strong form (which can hardly 
be true) they would claim that the anti-anxiety drugs reduce all aspects of anxiety and only 
anxiety. In their weak form they would claim that these drugs reduce at least some aspects of 
anxiety (hence their clinical usefulness), but not necessarily all of them or only anxiety. In spite 
of the implausibility of the strong forms of our major premisses, the arguments pursued here will 
assume their truth. In this way it is possible to construct a relatively strong theory with 
comparative clarity. It can then be left to future research (assuming the general approach is 
found sufficiently convincing to warrant it) to determine the extent to which ‘anxiety’ as 
described here overlaps with ‘anxiety’ as it emerges in the theory and research of other workers. 


Learning theory background 
Our major premisses are themselves embedded in the conceptual framework provided by 
learning theory. Since this theory is based almost entirely on experimental work with animals, 
some may wish to label what follows as a rat’s eye view of the emotions (and perhaps read no 
further in consequence). However, the drugs whose behavioural and physiological effects we 
shall consider -the benzodiazepines (e.g. librium, valium), the barbiturates (e.g. sodium amytal) 
and alcohol — derive their validity for our arguments from their clinical effects in man. Thus, the 
fact that the conceptual framework of learning theory provides a comfortable and 
well-illuminated niche for the behavioural effects of these drugs is strong support for the view 
that animal learning theory is relevant to man, and in particular to his emotions. 

If we are to make efficient use of this framework, some preliminaries are in order concerning 
the treatment accorded the emotions within learning theory. As discussed elsewhere (Gray, 
1975), the most common strategy is to treat the emotions as central states elicited by 
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instrumentally reinforcing stimuli or by stimuli which have been associated with such reinforcing 
stimuli. This strategy usually forms part of a version of ‘two-process theory’, according to 
which goal-directed behaviour is the outcome of an interaction between two fundamental 
learning processes, one being classical conditioning (responsible for the learning of associative 
relationships between discrete stimulus events), the other instrumental or operant conditioning 
(responsible for the establishment of behaviour patterns which affect the organism’s exposure to 
stimulus events). Within the instrumental conditioning component further subdivisions can be 
made according to the direction of change in response probability when different kinds of stimuli 
(reinforcers) are made contingent in different ways upon responses. These subdivisions are 
summarized in Table 1. According to my own version of two-process theory (Gray, 1975) the 
experimental evidence supports the view that the most appropriate classification of the 
subdivisions of instrumental learning is into two major varieties, shown in the columns of Table 
1: one (the left-hand column) is concerned with the acquisition of new responses for reward or 
for the termination or omission of punishment; the other (the right-hand column) is concerned 
with the inhibition of responses followed by punishment or the termination or omission of 
reward. 

As well as the unconditioned instrumental reinforcing events shown in Table 1, the interaction 
of the two learning processes produces ‘secondary’ or ‘conditioned’ reinforcing stimuli. These 
stimuli acquire the same kind of reinforcing property as is possessed by the unconditioned 
reinforcing event with which they are associated. In addition they acquire certain kinds of 
motivational properties which also depend on the unconditioned reinforcing event with which 
they have been paired. These motivational properties are still very much the subject of empirical 
research and their exact nature is not entirely clear (Mackintosh, 1974; Gray, 1975). 


Table 1. Instrumental reinforcing procedures with unconditioned reinforcing events (from Gray, 
1975) 





Outcome 
Procedure p{R)t PRY 
Presentation ' et a eke a A CANS 
Rew : Pun = eS 
(approach) : (passive avoidance) - ~ 
Termination - - - - ~- = - 
Pun! - - - - - Rew! 
(escape) - - - - (time-out) 
Omission ~ T = £ = at és 
Pun - - - - - Rew 
(active avoidance) - - - (extinction) 


Note. The abbreviations and symbols are as defined by the intersection of row (procedure) and column 
(outcome). p(R)t: outcome is an increase in the probability of the response on which the reinforcing event 1s 
made contingent. p(R){: outcome is a decrease in the probability of this response. Dots and lines indicate 
those procedures-plus-outcomes which define a stimulus as an S®* or an S®-, respectively. Bracketed 
phrases refer to typical learning situations in which the various reinforcing procedures are employed. Rew, 
Reward; Pun, Punishment; !, termination; ——, omission. 
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The behavioural effects of anti-anxiety drugs 


After this very brief outline of the conceptual background against which I propose to summarize 
the behavioural effects of the anti-anxiety drugs, the summary itself will need to be equally brief 
— dogmatically so, I fear (see Gray, 1977, for further treatment). 

In the light, then, of the various distinctions made in the previous section, one may give a 
simple description of the behavioural effects of the anti-anxiety drugs: these agents block the 
behavioural effects of secondary punishing stimuli and secondary non-rewarding (or ‘frustrative’) 
stimuli, but of no other secondary reinforcing stimulus and of no primary reinforcing event. 

To make this conclusion more concrete, consider first experiments using electric shock as an 
aversive stimulus. The anti-anxiety drugs do not consistently alter unconditioned responses 
elicited by shock (e.g. flinching, jumping, aggressive behaviour), nor do they affect simple 
escape or active avoidance learning (reinforced by shock termination and shock omission, 
respectively: see Table 1). Suppose, however, the animal makes a response (either innate or 
previously acquired for a reward) and finds that this response is followed by shock. The 
undrugged animal will show a reduced probability of emitting this response on future occasions; 
and so will a drugged animal, but to a much smaller extent. Thus, in experiments using electric 
shock as an aversive stimulus, the anti-anxiety drugs have a highly specific behavioural effect: 
they antagonize the response suppression produced by punishment. Given that this kind of 
response suppression necessarily occurs in anticipation of the delivery of the shock, and given 
that these drugs do not alter responses directly elicited by shock, we may rephrase this 
conclusion by saying that they reduce the control of behaviour by stimuli which warn of 
impending punishment. 

A very similar conclusion can be reached when we use the omission of reward (‘frustrative 
non-reward’) as the aversive event rather than electric shock. (That frustrative non-reward is 
aversive has been shown in a large number of experiments: Gray, 1975.) Consider a rat trained 
to run in an alley for food or water reward. Administration of an anti-anxiety drug has no 
consistent effect on the animal's ability to acquire this or a range of other responses. But if the 
response is then extinguished, drugged animals take longer to give it up than do controls. As in 
the case of punishment, the animal’s ability to inhibit its behaviour in response to stimuli which 
warn of an impending aversive event (non-reward) is reduced. Also as in the case of 
punishment, there is evidence that responses directly elicited by frustrative non-reward are 
unchanged by anti-anxiety drugs. The clearest instance of this lack of effect is in Amsel & 
Roussel’s (1952) double-runway situation, in which non-reward in the first of two goalboxes 
causes animals to run faster in the second of two sequential alleys. This ‘frustration effect’ is 
unaffected by sodium amylobarbitone (there are five negative reports), although this drug reliably 
decreases the behavioural effects of anticipated non-reward under a wide range of conditions. 

A conclusion which would fit the data I have mentioned so far might be that the anti-anxiety 
drugs specifically impair an animal’s ability to inhibit instrumental behaviour. That conclusion, I 
believe, is true, but it is only part of the truth. For there are other behavioural effects of stimuli 
which warn of impending punishment or non-reward, besides behavioural inhibition, which are 
also impaired by the anti-anxiety drugs. The best example is the partial reinforcement acquisition 
effect (PRAE) or Goodrich-Haggard effect: if rats are run in a straight alley for food or water on 
a schedule of 50 per cent random partial reinforcement (PRF) and compared to controls 
receiving continuous reinforcement (CRF), they frequently show a greater running speed at the 
end of acquisition. Amsel (1962) has analysed this phenomenon as demonstrating the increased 
arousal (‘drive’) produced by anticipation of non-reward in the PRF group; and there is indeed 
clear evidence of such changes in arousal in response to both conditioned frustrative and 
conditioned punishing stimuli (Gray, 1975). Now, if the anti-anxiety drugs block only inhibitory 
changes, they should not alter the PRAE. If, on the other hand, they generally reverse any 
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behavioural changes produced by conditioned frustrative stimuli, they should reduce the PRAE. 
There are several experiments which confirm the latter prediction (Gray, 1977). 


_ The psychology of anxiety 

It appears from the findings discussed so far that there is an important similarity between 
non-reward and punishment: responses controlled by stimuli which predict the imminent 
occurrence of these events are impaired by the anti-anxiety drugs; responses elicited by them are 
unchanged. This similarity in drug-sensitivity is paralleled by a large number of purely 
behavioural similarities between punishment and non-reward (Wagner, 1966; Gray, 1975). In 
terms of the emotions these similarities may be expressed as two functional equivalences: (i) 
between unconditioned fear and frustration, elicited by punishment and non-reward respectively; 
and (ii) between conditioned fear and frustration, elicited by secondary punishing and secondary 
frustrative stimuli, respectively. It seems, furthermore, that the two emotional states joined 
together in (i) are different from the two joined together in (ii). We have already seen that they 
differ in their response to drugs, the conditioned emotional states being sensitive to the 
anti-anxiety drugs, the unconditioned states not being sensitive. Behaviourally, there is again a 
parallel to this difference in drug sensitivity. Whereas the chief behavioural signs produced by 
unconditioned punishment and non-reward are increases in locomotor activity, attempts at 
escape and aggressive behaviour, what is observed in an animal anticipating punishment or 
non-reward is quite different: inhibition of ongoing behaviour, taking the form of freezing in 
extreme instances (Gray, 1975). 

These various lines of argument have led to the hypothesis that there are two independent 
systems: one a ‘fight/flight’ system, responsive to unconditioned punishment and non-reward, 
and the other a ‘behavioural inhibition system’ responsive to stimuli which are associated with 
these aversive events (Gray, 1972). It is the latter system which is the theme of this lecture. 
Since activity in the behavioural inhibition system appears to be selectively antagonized by the 
anti-anxiety drugs, it is a reasonable hypothesis that such activity underlies the emotion of 
anxiety. Viewed in this way, ‘anxiety’ is synonymous with ‘conditioned fear plus conditioned 
frustration’. 

It might be objected to this hypothesis that, since we are able correctly to apply the words 
‘fear’ and ‘frustration’ to the separate states to which they each apply, they cannot possibly 
refer to the same state. But this objection would be misplaced. The theory advanced here is a 
theory about the emotions, not about the words used to describe the emotions. As Schachter 
(Schachter & Singer, 1962) has shown, in labelling one’s own emotional state one takes account 
not only of the internal nature of that state, as it is available to introspection, but also of the 
situation which has given rise to that state. This, indeed, is also what we do when we label 
emotions in others, whether they are members of our own species or of a different one. The 
words ‘fear’ and ‘frustration’ are used systematically differently when experimental 
psychologists talk about rats, but this is because they take account of the operation (punishment 
or the omission of reward) to which they have exposed their subjects. These different uses do 
not render meaningless the hypothesis that the rat enters the same emotional state, regardless of 
the particular reinforcing event, punishment or non-reward, to which it has been exposed. In the 
same way, there is no difficulty in supposing that the emotional state produced by signals of 
punishment (fear) is the same as that produced by signals of omission of reward (anticipatory 
frustration) in man, even though (since one knows the events which have given rise to the 
emotional] state) both the experiencing subject and an observer of the subject will use different 
words to label this state depending on those events. 

From now on, therefore, we shall use the word ‘anxiety’ to mean that emotional state to 
which secondary punishment and secondary frustrative stimuli (but not unconditioned 
punishment or frustrative non-reward) give rise; and we shall use the term ‘behavioural 
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inhibition system’ to mean that system, activity in which produces the emotion of anxiety. Ex 
hypothesi the behavioural inhibition system is activated by secondary punishing and frustrative 
stimuli and its activity is reduced by the anti-anxiety drugs. | 

So far, in our treatment of the behavioural inhibition system, we have considered only stimuli 
with clear-cut reinforcing effects as defined by Table 1. There is some reason, however, to add 
to the list of adequate stimuli for anxiety one further event which does not figure in that Table, 
namely, novelty. As pointed out elsewhere (Gray, 1975), novel stimuli share with conditioned 
aversive stimuli the properties of inhibiting ongoing behaviour (originally described by Pavlov as 
‘external inhibition’) and of increasing the level of arousal (Sokolov, 1960). The major effect of 
novel stimuli - almost their defining characteristic — is, however, that they attract attention to 
themselves and also increase attention to other features of the environment in which they occur 
(the phenomenon of dishabituation: Sokolov, 1960). But the same appears to be true of 
conditioned frustrative stimuli (Sutherland, 1966; McFarland, 1966). 

There are also indications that the anti-anxiety drugs reduce attention to environmental 
change, whether under conditions of novelty alone or under conditions involving frustrative 
non-reward. For example, Ison, Glass & Bohmer (1966) found that sodium amylobarbitone 
reduced the tendency of rats to enter the changed arm of a T maze. Also in the T maze, a 
number of experiments have shown a reduction in spontaneous alternation after injection of an 
anti-anxiety drug: i.e. whereas the undrugged rat tends to alternate his choices of arm, the 
drugged rat tends to repeat his first choice. In a more complex experiment involving non-reward, 
McGonigle, McFarland & Collier (1967) showed that sodium amylobarbitone reduced the 
attention paid by an animal to a second stimulus dimension, added after a first one had been 
learned on a PRF schedule in a simultaneous discrimination. 

There is a case, therefore, for the addition of novelty to the list of adequate stimuli for 
anxiety. No doubt, however, the degree of novelty must rise beyond some threshold if it is to 
provoke emotional reactions (‘surprise’ or ‘apprehension ’). Furthermore, since novel stimuli also 
evoke approach behaviour, such emotional reactions are unlikely ever to be unmixed. There is 
also a case for adding to the list of outputs of the behavioural inhibition system one of switching 
attention, especially to novel features of the environment. 

Adopting these additions, we may summarize the key features of the theory of anxiety 
proposed here as follows: 

(1) Anxiety is a central state consisting of activity in a hypothetical behavioural inhibition 
system. 

(2) The adequate stimuli for activating this system are secondary punishing stimuli, secondary 
frustrative stimuli and novel stimuli. 

(3) The behavioural effects of activity in this system are inhibition of ongoing behaviour, 
increased arousal and increased attention to environmental stimuli, especially novel ones. 

(4) The function of activity in this system is to suppress existing but maladaptive behaviour 
patterns while scanning the environment for possible alternative behaviour patterns. 

No evidence is offered in support of the last of these postulates; it is put forward simply as a 
reasonable interpretation of the remaining properties attributed to the behavioural inhibition 
system. 

This, then, is the psychology of anxiety as it emerges from experiments on the behavioural 
effects of the anti-anxiety drugs in animals. There are, I am sure, many readers of this lecture 
who will be suspicious of the possibility of making any extrapolations from animals to man. 
Their caution is reasonable. But, if they will examine the conclusions summarized above, they 
will perhaps agree that they do not look too implausible as a description of anxiety and its 
precipitating circumstances in our own species. Consider also the basic finding in the study of 
the behavioural effects of the anti-anxiety drugs: that emission of a punished response is 
enhanced by these agents. The name for this phenomenon in our species is ‘Dutch courage’. It 
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occurs in every species investigated, from fish to man. Anxiety — or at any rate 
be phylogenetically very old, as is the effect of drugs upon it. But that in turn s 
brain mechanisms on which these drugs act, and which underlie anxiety, are al: 
phylogenetically old and therefore amenable to experimental investigation in an 
results of such experiments we now turn. 


The septo-hippocampal system 


If the behavioural inhibition.system exists, it must be in the brain. A first clue : 
comes from the behavioural effects of lesions to the brain. If the anti-anxiety d 
impairing the function of a particular brain region, then destruction of that regii 
produce effects on behaviour similar to those produced by these drugs. On the 
argument I suggested (Gray, 1970 a) two particular brain structures as likely car 
site of action of the anti-anxiety drugs: the septal area and the hippocampus. 
These structures are very closely interrelated, both anatomically and physiok 
The hippocampus displays a characteristic pattern of rhythmic slow electrical a 
from about 4 to 12 Hz (the ‘hippocampal theta rhythm’) under most conditions 
animal is behaviourally active. The functional significance of this rhythm is still 
considerable dispute (e.g. Vanderwolf, Kramis, Gillespie & Bland, 1975; Landfi 
1977), but it is well established that it is controlled physiologically by pacemake 
in the medial septal area (Stumpf, 1965). Fibres from the medial septal area inn 
hippocampus diffusely after travelling in the fimbria and fornix (Meibach & Sie; 
Conversely, the major direct subcortical projection of the hippocampus is to th: 
area via the fimbria (Raisman, Cowan & Powell, 1966; DeFrance, 1976). Thus k 
anterior septal region (containing the medial and lateral septal areas) destroy bo 
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Figure 1. Schematic diagram of the septo-hippocampal system. Forn., Fornix, probably < 
theta-controlling fibres from the medial septal area. Fim., Fimbria. NorAdr., The noradre 
septo-hippocampal system from the locus coeruleus. 
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Table 2. Similarities between behavioural effects of anti-anxiety drugs, septal lesions, and 
hippocampal lesions 


Effects of 


Anti-anxiety Septal Hippocampal 
Type of task drugs lesions lesions 


Food and water intake 

Rewarded running, CRF 
Rewarded barpressing, CRF 
Passive avoidance 

Motor reactions to shock 
Aggressive reactions to shock 
Aversive classical conditioning 
Escape from shock 

l-way active avoidance 

2-way active avoidance 

Barpress active avoidance 
Extinction f 
Rewarded barpressing, FI or DRL 
Simultaneous discrimination 
Successive discrimination 
Double-runway FE 

Spontaneous alternation 
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Note. —, Performance is generally reported to be impaired; 0, the literature fails to reveal any consistent 
changes; +, performance 1s generally reported to be enhanced; ?, insufficient data. 


to the hippocampus and a major output from it, as well as radically altering hippocampal 
electrical activity by permanently abolishing the theta rhythm (Stumpf, 1965; Gray, 1971 a). 

We have recently completed an exhaustive review of the very large number of reports of the 
behavioural effects of septal and hippocampal lesions (Gray & McNaughton, in preparation). The 
major conclusions from this review are set out in Table 2, which also permits the reader to 
compare the septal and hippocampal syndromes with the syndrome produced by the anti-anxiety 
drugs. Two key findings emerge: (1) in the great majority of behavioural paradigms for which 
data are available on both lesions, their effects are strikingly similar; (2) whenever the effects of 
the two lesions are similar, and corresponding drug data are available, the direction of the 
behavioural change produced by the lesions is the same as that produced by the anti-anxiety 
drugs. 

The data summarized in Table 2 clearly offer a prima facie case in support of the hypothesis 
that the septal area and the hippocampus function together as an integrated ‘septo-hippocampal 
system’ (SHS), and that the anti-anxiety drugs reduce anxiety by altering the activity of the 
SHS. The research which my collaborators and I have been pursuing over the last seven years 
has been directed towards testing this hypothesis further, and towards gaining an understanding 
of the nature of the action of these drugs on the SHS. In the remainder of this lecture I shall 
outline this research and the conclusions we have been able to draw from it. 


The partial reinforcement extinction effect 


Many of the experiments which I shall describe have been concerned with the partial 
reinforcement extinction effect (PREE). As is well known (Mackintosh, 1974), animals 
reinforced on a PRF schedule subsequently display greater resistance to extinction than animals 
trained for an equal number of trials on CRF. As argued by Amsel (1962) in his application of 
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frustration theory to this phenomenon, it may be considered a consequence of the fact that PRF 
animals develop tolerance for the aversive effects of frustrative non-reward. There is evidence 
that non-reward, like other stressful stimuli (e.g. cold, electric shock), activates the 
pituitary-adrenal system, as shown by a rise in corticosterone levels in blood (Levine, Goldman 
& Coover, 1972; Valero & Gray, unpublished data). Thus extinction of previously rewarded 
responses may be regarded as a model of stress-produced behaviour and the PREE as a model 
of behavioural tolerance to stress. 

There is evidence, furthermore, that the tolerance to frustrative non-reward which is 
developed by a PRF schedule can transfer to tolerance for electric shock (Brown & Wagner, 
1964). These authors also showed that tolerance to electric shock, produced by a 
counter-conditioning schedule of shock paired with food reward, can transfer to tolerance for 
non-reward, as indicated by an increased resistance to extinction. Since other workers (Weiss et 
al. 1975) have shown. that similar cross-tolerance may be demonstrated between electric shock 
and cold stress, it is possible that a common physiological change underlies at least in part 
tolerance for all forms of stress, irrespective of the particular stressor used. If that is correct, 
the PREE may perhaps serve as an index of that general physiological change. It has the very 
great advantage over other similar behavioural measures that it has been the subject of a 
substantial research effort. In consequence, we know a great deal about the behavioural factors 
which affect its magnitude and we have a good theoretical understanding of how these factors 
work (Mackintosh, 1974). 

The effects of the anti-anxiety drugs on the PREE have been studied by a number of workers, 
including ourselves. The results obtained depend on a number of factors, but it is clear that 
under the right conditions administration of an anti-anxiety drug during training on a PRF 
schedule is capable of blocking the PREE and that this is not due to state-dependent learning. 
(This is not the place for a detailed discussion of what the ‘right conditions’ are. They include 
experiments run at one trial per day; or experiments with several trials per day, provided the 
number of training trials is not too few; and they are generally consistent with the view that the 
anti-anxiety drugs block Amsel’s conditioned frustration rather than Capaldi’s after-effects of 
non-reward: see Gray, 1977.) An example of this kind of result, recently obtained by Joram 
Feldon in my laboratory, is shown in Fig. 2. 


The septo-hippocampal system and the partial reinforcement extinction effect 


As we have seen, there is good reason to suppose that the anti-anxiety drugs alter behaviour by 
way of an action on the SHS. Given the control that the septal area exerts over the hippocampal 
theta rhythm (Stumpf, 1965), a natural first hypothesis is that the anti-anxiety drugs in some way 
impair this control; and this was indeed the hypothesis I proposed in 1970. In its crudest form, 
this hypothesis would hold that (a) the hippocampus cannot perform its behavioural functions 
properly without a theta rhythm, (b) septal lesions have effects like those of hippocampal 

lesions because they eliminate the theta rhythm, and (c) the anti-anxiety drugs also eliminate the 
theta rhythm. This hypothesis is not tenable. To begin with, the anti-anxiety drugs do not 
eliminate the theta rhythm (though, as we shall see, they do have other effects on this pattern of 
electrical activity). In addition, the theta rhythm accompanies not only forms of behaviour which 
are disrupted by septal and hippocampal lesions and the anti-anxiety drugs, but also forms which 
are not. Thus, if the theta rhythm plays any role in mediating the behavioural effects of the 
anti-anxiety drugs, it must be a more subtle one. 

A first hint as to what this role might be came from the observation (Gray & Ball, 1970) that 
the frequency of the theta rhythm recorded from a free-moving rat in a simple alley task varies 
predictably depending on its behaviour and on what is happening to it. In particular, we 
observed a frequency of approximately 6-7 Hz when the animal was rewarded and in the 
process of consuming the reward (water) in the goalbox; a frequency of about 9-10 Hz when a 
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Figure 2. Course of extinction, at one trial per day in a straight alley, in groups of rats given continuous 


reinforcement (CRF) or partial reinforcement (PRF) during training and chlordiazepoxide (CDP), 5 mg/kg, or 
placebo throughout training and extinction. Data kindly supplied by J. Feldon. 


well-trained rat was running down the alley towards the goalbox; and an intermediate frequency 
when the animal entered a goalbox in which it expected to find water and was instead exposed to 
frustrative non-reward. The mean theta frequency under the latter conditions was found to be 
7-7 Hz, a value later confirmed by Kimsey, Dyer & Petri (1974) and Soubrié (personal 
communication). A similar value has been reported by Kurtz (1975) when sexual behaviour is 
non-rewarded. Theta frequencies at around 7:7 Hz are also seen when rats explore a novel 
environment for the first time (Gray & Ball, 1970). 

Since the anti-anxiety drugs attenuate the behavioural effects of non-reward and of novelty, 
but not those of reward, these data considerably limit the scope of any hypothesis purporting to 
relate these drug effects to the theta rhythm. They imply that any alteration produced by these 
drugs in septal control of theta is restricted to frequencies lying close to 7-7 Hz. I therefore 
proposed a ‘frequency-specific’ hypothesis of the functional significance of theta and of the 
effects on theta of the anti-anxiety drugs (Gray, 1970a). According to this hypothesis, theta 
consists of three functionally distinct frequency bands: a low frequency band (less than about 
7 Hz in the rat) is related to fixed action patterns, including consummatory behaviour; a middle 
frequency band (centred on 7-7 Hz) is related to the activity of the behavioural inhibition system 
as described in this paper; and a high frequency band (above about 8-5 Hz) is related to the 
performance of goal-directed behaviour (rewarded or active avoidance). The anti-anxiety drugs, 
on this hypothesis, alter septal control of theta only in the middle frequency band. 

This, then, is the hypothesis which has guided our research in the last few years. Had we 
known at the time we proposed it of the extensive work of Vanderwolf and his associates (e.g. 
Vanderwolf et al. 1975) showing a close positive correlation between theta frequency and the 
intensity of motor behaviour, we would probably have disregarded our own findings as being 
merely a consequence of the degree to which the animal was moving under our different 
conditions of observation: little when consuming a reward, perhaps more after non-reward, and 
obviously a great deal when running down the alley. Fortunately, we did not know of this work 
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and instead conceived the idea that there is something special about theta frequencies close to 
7-7 Hz; in particular, that such frequencies are related to the behaviour patterns which 
anti-anxiety drugs impair. Testing this hypothesis has produced a number of findings which fit 
rather well with it; and has led us to a number of new ideas concerning the action of the 
anti-anxiety drugs and the organization of the SHS. 

We have tested our ‘frequency-specific’ hypothesis of the significance of the theta rhythm in a 
number of ways. First, it was reasoned that, if a theta frequency of 7-7 Hz is functionally part 
of a brain system which mediates responses to non-reward, then artificially inducing such a theta 
frequency (which may be done by stimulating the medial septal area at the required frequency 
through permanently implanted electrodes) should strengthen or mimic the behavioural effects of 
non-reward. This deduction was confirmed by experiments in which such ‘theta-driving’ at 
7-7 Hz during extinction after CRF increased the speed of extinction; while theta-driving at this 
frequency during acquisition on a random 50 per cent of trials (the animal being rewarded with 
water on every trial) produced a ‘pseudo-PREE’ (i.e. greater resistance to extinction than that 
shown by an unstimulated control group, neither group being stimulated during extinction; Gray, 
1972 b). Using different techniques, Glazer (1974) has produced a similar pseudo-PREE by 
experimentally inducing 7-7 Hz theta. Conversely, it was reasoned that blocking the normal 
7-7 Hz response to non-reward (which can be done by means of high-frequency stimulation of the 
medial septal area) ought to reduce resistance to extinction in animals trained on a PRF 
schedule; and this deduction was also confirmed (Gray, Araujo-Silva & Quintao, 1972). 

A further way of testing our hypothesis was by lesioning the septal area or hippocampus and 
observing behaviour which had so far not been investigated after such lesions, but which was 
known to be affected by anti-anxiety drugs. The PREE is one such phenomenon. It was known 
that septal and hippocampal lesions increase resistance to extinction after CRF (as of course do 
the anti-anxiety drugs). But it is clear from Fig. 2 that the anti-anxiety drugs also reduce 
resistance to extinction after PRF training (Gray, 1969). Could this effect also be found after 
septal or hippocampal lesions? 

In a first experiment on the effects of septal lesions on the PREE, Gray et al. (1972) showed 
that this effect could indeed be obtained: resistance to extinction was increased by these lesions 
after CRF training but decreased after PRF training. A subsequent experiment by Henke (1974) 
confirmed this. In a further investigation of these effects Feldon, Rawlins and J (unpublished) 
have been looking at lesions confined to the medial or lateral septal areas respectively. This 
work has been carefully controlled electrophysiologically, a medial septal lesion being defined as 
one which virtually abolishes theta but is otherwise as small as possible, a lateral septal lesion as 
one which does not alter theta but is otherwise as large as possible. Our results (Figs 3 and 4) 
show clearly that, at a 24 hour inter-trial interval, the increased resistance to extinction seen 
after CRF training in septal animals is due to medial septal damage, but the decreased resistance 
to extinction after PRF training (and the consequent abolition of the PREE) is due to lateral 
septal damage. 

We have tried to relate this double dissociation between the behavioural effects of medial and 
lateral septal lesions to the organization of the SHS by way of the hypothesis shown in Fig. 5. 
According to this hypothesis the medial septal area is the recipient of information, conveyed via 
an unknown route by secondary frustrative stimuli, concerning the imminence of non-reward. 
This information is conveyed to the hippocampus by way of the theta-producing fibres which 
travel in the dorsal fornix (Myhrer, 1975; Rawlins, unpublished observations). The hippocampus 
has the job of inhibiting the non-rewarded behaviour (by an unknown route) while determining 
the best behavioural strategy in the changed circumstances. (This period of behavioural 
inhibition and uncertainty is subjectively experienced as anxiety.) Under conditions in which the 
best strategy is in fact to continue with the original behaviour (as on a PRF schedule), the 
hippocampus sends a message (perhaps via the fimbria) to the lateral septal area which in turn, 
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Figure 3. Effects of medial septal lesions on the partial reinforcement extinction effect in the alley at one 


trial per day. CRF, Continuous reinforcement. PRF, Partial reinforcement. Data kindly supplied by 
J. Feldon. 
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Figure 4. Effects of lateral septal lesions on the partial reinforcement extinction effect in the alley at one trial 
per day. CRF, Continuous reinforcement. PRF, Partial reinforcement. Data kindly supplied by J. Feldon. 
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via septal interneurones (DeFrance, 1976), inhibits or otherwise alters the medial input to the 
hippocampus. The operation of this hippocamposeptal pathway underlies the phenomenon of 
counter-conditioning. We are at present testing this hypothesis. It may easily be generalized 
from non-reward to punishment and novelty, and we plan also to extend our experiments to deal 
with these kinds of stimuli. 

The findings which Fig. 5 attempts to integrate pose new questions about the site of action of 
the anti-anxiety drugs. It is clear that they exclude both the medial and the lateral septal regions 
as unique sites of action, since lesions to each of these areas produce only part of the 
anti-anxiety syndrome. It is possible that the drugs act on both regions, increasing resistance to 
extinction via the medial septal area and blocking counter-conditioning via the lateral septal area. 
But this hypothesis lacks parsimony. It also treats the blocking of the PREE by the anti-anxiety 
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Figure 5. A model for the role of the septo-hippocampal system in counter-conditioning. For explanation, see 
text. 


drugs, not as a secondary consequence of the general attenuation produced by these agents in 
the behavioural effects of aversive stimuli, but as a direct interference with the process of 
counter-conditioning. While this view appears to be correct with regard to lesions of the lateral 
septal area, it is somewhat counter-intuitive to suppose that anti-anxiety drugs make it harder to 
learn to tolerate stress; though this possibility should not be dismissed out of hand. A more 
attractive hypothesis, which avoids these problems, is that the drugs act directly on the 
hippocampus. Another possibility is that the anti-anxiety drugs act upon some set of inputs to 
the SHS which initiate the processes illustrated in Fig. 5. This latter possibility is made 
particularly plausible and more concrete by the data discussed in the following section. 


The dorsal ascending noradrenergic bundle 


It will be recalled that our observations of changes in hippocampal theta frequency in the alley 
(Gray & Ball, 1970) showed that consummatory behaviour was associated with theta frequencies 
in the range 6-7 Hz, exploratory behaviour and reactions to non-reward with frequencies in the 
range 7-8-5 Hz (with a mean response to non-reward of 7-7 Hz), and locomotor approach 
behaviour with frequencies in the range 8-5—10-0 Hz. From these observations it was proposed 
that anti-anxiety drugs should alter septal control of theta only in the middle frequency band, 
since it is only this band which is associated with behaviour which the drugs impair. One way of 
testing this hypothesis is to measure the current threshold for eliciting theta by septal stimulation 
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(‘theta-driving °) as a function of stimulation (and therefore theta) frequency, and then to look at 
the effects on the obtained function of the anti-anxiety drugs. If the frequency-specific 
hypothesis is along the right lines, we would expect the threshold for theta-driving to be 
particularly raised at a frequency of about 7:7 Hz after administration of anti-anxiety drugs. 

It should be noted that, in the undrugged free-moving male rat, there is a characteristic 
relation between theta-driving threshold and septal stimulation frequency: over the range 
6-10 Hz there is a minimum threshold located precisely at 7:7 Hz (i.e. at an inter-pulse interval 
of 130 msec) (Gray & Ball, 1970; James et al. 1977). It is intriguing that this minimum in the 
‘theta-driving curve’ is at the same frequency which is seen in response to frustrative 
non-reward. As predicted by the frequency-specific hypothesis, a representative barbiturate 
(sodium amylobarbitone), several benzodiazepines (chlordiazepoxide, diazepam, nitrazepam) and 
alcohol all selectively raise the theta-driving threshold at 7-7 Hz and eliminate the minimum 
threshold found at this frequency in the undrugged rat (Gray & Ball, 1970; Gray, McNaughton, 
James & Kelly, 1975; Nettleton, personal communication: Fig. 6). 

These results offer striking support for the frequency-specific hypothesis, and we therefore 
tried to establish their neuropharmacological basis. To this end we attempted to mimic the 
characteristic effect on the theta-driving curve of anti-anxiety drugs by using other agents with 
better understood effects on putative neurotransmitters (Gray et al. 1975; McNaughton et al. 
1977). We found that we were unable to produce this effect by altering cholinergic, serotonergic 
or dopaminergic transmission. Selective blockade of noradrenergic function, however, gave an 
effect which was highly similar to that of the anti-anxiety drugs (Fig. 6). 

These results prompted us to look for the neural substrate of the effects we had obtained by 
pharmacological blockade of noradrenergic function. The natural candidate is the dorsal 
ascending noradrenergic bundle (DANB) (Ungerstedt, 1971). This bundle originates in the locus 
coeruleus in the brain stem and innervates the whole of the forebrain, including the hippocampus 
and septal area (Fig. 7). In order to ascertain whether it was involved in the effects shown in 
Fig. 6, we used a local injection of a neurotoxin, 6-hydroxy-dopamine (6-OHDA: Ungerstedt, 
1968), which is specific to noradrenergic and dopaminergic neurones. Injection of this poison into 
the DANB produced a virtually total loss of noradrenaline in the hippocampus, with no change 
in dopamine levels and no evidence of involvement of the other major noradrenergic projection, 
the ventral bundle (Gray ef al. 1975). In animals treated in this way there was no sign of a 
minimum threshold at 7-7 Hz in the theta-driving curve (Gray et al. 1975; Fig. 6). 

These results suggest that the anti-anxiety drugs exert their characteristic effects on the 
theta-driving curve by an action on the DANB. Independent evidence for this locus of action 
had earlier been obtained by Corrodi, Fuxe, Lidbrink & Olson (1971) and Lidbrink, Corrodi, 
Fuxe & Olson (1972). These workers showed that stress increases forebrain noradrenaline 
turnover and that this increase is antagonized by barbiturates, benzodiazepines and alcohol. A 
further implication of this hypothesis is that the behavioural effects of the anti-anxiety drugs 
might also be mediated by an action on the DANB (Gray et al. 1975). It follows from this 
hypothesis that destruction of the DANB by injection of 6 OHDA ought to reproduce the 
behavioural effects of these drugs. For example, in a situation involving both reward and 
non-reward, the behavioural effects of non-reward should be reduced after such a lesion, but 
those of reward should be unaltered. 

Notice that this hypothesis runs directly counter to the most accepted contemporary view of 
the function of the DANB: namely, that this pathway mediates the behavioural effects of 
reward. The latter hypothesis is based on studies of electrical self-stimulation of the brain (Stein, 
1964; Rolls, 1975) and on experiments using electrolytic lesions of the locus coeruleus (Anlezark, 
Crow & Greenway, 1973). The locus coeruleus is a very small structure, however, and 
electrolytic lesions are bound to damage other areas. Furthermore, destruction of the locus 
coeruleus also interrupts the noradrenergic innervation of the cerebellum, which also has its 
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Figure 6. Effects of various treatments on thresholds for septal driving of hippocampal theta rhythm as a 
function of stimulation frequency in the free-moving male rat. (a), (b) and (c), average results obtained from 
a group of male rats before and after drug treatment. Control injection consisted of appropriate vehicle alone. 
(a) Chlordiazepoxide HCl, n = 4, 5 mg/kg intraperitoneally (i.p.) in 1 ml/kg saline 30 min before test. (b) 
A@THC, n=4, 0-5 mg/kg i.p. in 1 ml/kg Tween 80-saline solution 20 min before test. (c) a-methyl-p-tyrosine 
(a-MPT), n= 4, 100 mg/kg i.p. in 1 ml/kg saline followed 7-25 h later by 30 mg/kg dl-threo-3, 
4-dihydroxy-phenylserine (DOPS) i.p. suspended in 1 ml/kg saline, readings were taken 6-5 h after a-MPT 
and 70 min after DOPS. (d) Average of three rats given control injection of saline-ascorbic acid vehicle in 
dorsal ascending noradrenergic bundle. (¢) Average of three rats injected with 6-OHDA in dorsal bundle. (a) 
and (b), @, Drug; O, control: (c) @, a-MPT; ~--, DOPS; O, control. From Gray et al. (1975). 


origin in this nucleus. Thus the use of stereotaxic injections of 6-OHDA is to be preferred as a 
technique for investigating this problem. 

When natural reward (as distinct from brain stimulation reward) is used, it is clear that the 
reward hypothesis of the functions of the DANB is not substantiated: after virtually complete 
destruction of the DANB with 6-OHDA animals run just as well as controls for food rewards in 
the alley (Mason & Iversen, 1975; Owen, Boarder, Gray & Fillenz, unpublished) and they press 
bars just as well in the Skinner box (Mason & Iversen, 1977). The non-reward hypothesis, in 
contrast, fares very well. Mason & Iversen (1975) showed that resistance to extinction after CRF 
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Figure 7. Sagittal projection of the ascending noradrenergic pathways. The descending pathways are not 
included. The stripes indicate the major nerve terminal areas. From Ungerstedt (1971). 
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Figure 8. Course of extinction, at ten trials per day, in the alley as a function of continuous reinforcement 
(CR) or partial reinforcement (PR) during training and 6-hydroxydopamine (lesion) or control injections into 


the dorsal noradrenergic bundle. The point marked ‘A’ on the abscissa is the last day of acquisition. Data 
kindly supplied by Mrs S. Owen. 


in the alley is increased by DANB lesions; and Owen et al. (unpublished) not only replicated this 
effect but also showed that resistance to extinction after PRF is decreased and the PREE 
abolished (Fig. 8). Thus, in the alley PREE situation the DANB lesions mimic very well the 
behavioural effects of injection of anti-anxiety drugs (e.g. Gray, 1969; and see Fig. 2). 

These findings support the hypothesis that the behavioural effects of the anti-anxiety drugs are 
mediated by an action on the DANB. If so, we may make use of the premiss with which this 
lecture opened: if the anti-anxiety effects of these drugs are produced by impairing activity in 
the DANB, then activity in the DANB forms part of the neural substrate of anxiety. From this 
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argument we may pares that dividuals whoa are particularly susceptible to aust should have 
much activity in the DANB, and those resistant to anxiety little such activity. 

This prediction is, of course, difficult to test in man. However, we have recently commenced 
testing it in rats selectively bred to be high or low on a trait of fearfulness (the Maudsley 
Reactive, MR, and Non-reactive, MNR, strains, respectively: Broadhurst, 1960, 1975). We have 
not so far directly investigated the function of the DANB in these animals. But we have 
measured the theta-driving curve in them, with encouraging results. Male MR rats resemble 
unselected rats, including the Wistar strains from which they were derived, in displaying a 
7-7 Hz minimum in the theta-driving curve; male MNR rats lack this minimum (Drewett et al. 
1977). We may deduce from this finding that the response to selective breeding for low 
fearfulness in the MNR strain has included a reduction in the activity of the DANB. We he 
soon to investigate this deduction directly. 


Conclusion 


Let us attempt to integrate the findings discussed in the previous two sections. As shown in Fig. 
5, it is possible to account for most of these findings if we assume that the primary action of the 
anti-anxiety drugs is at the level of the DANB (though it should be noted that we have no 
knowledge of the way in which these drugs act on this pathway, nor whether they do so directly 
or by way of some further, ‘more primary’ structure). In this way these drugs alter the 
noradrenergic input to the SHS, removing the selective facilitation which this input confers on 
theta rhythms in the 7-7 Hz band. The target areas within the SHS whose input is thus altered 
may consist of the septal area (both medial and lateral), the hippocampal formation, or both 
(Fig. 1). The system made up of the DANB and the SHS has the task of inhibiting ongoing 
behaviour upon the receipt of information about secondary punishing, secondary frustrative or 
novel stimuli (though most of our data concern frustration only). This function involves the 
medial but not the lateral septal nucleus. The DANB-SHS system also has the job of deciding 
whether, in the light of the changed circumstances signalled via the medial septal area, the 
behaviour pattern which was originally in progress should be permanently abandoned 
(passive avoidance, extinction) or persisted in (PREE, persistence in the face of stress). The 
latter outcome involves counter-conditioning (the development of tolerance for stress), which 
may depend on signals from the hippocampus travelling in the fimbria to the lateral septal area. 
This, then, is our model of anxiety. At the psychological level, it is an internal state entered 
upon receipt of stimuli associated with punishment or frustrative non-reward or novel stimuli; 
whose behavioural effects consist of inhibition of ongoing behaviour, heightened arousal and 
heightened attention to environmental stimuli, especially novel ones; and whose function is to 
enable the organism to decide whether or not the changed circumstances require an alteration 
of existing behaviour patterns. At the physiological level, it consists of activity in the 
DANB-SHS system (Figs 1 and 4) as discussed above. There is much that has been left out of 
this account. At the psychological level, we have ignored the stimuli which occur during social 
interaction, and which appear to be of great importance for anxiety (Gray, 1971 b, 1976). At the 
physiological level, we have ignored the probable role played by the serotonergic mechanisms 
which also innervate the SHS (Stein, Wise & Berger, 1973; Gray, 1976). And at the level of 
personality theory, we have left out the hypothesis (and some recent evidence supporting it) that 
human beings who are particularly prone to manifest clinical syndromes involving anxiety are 
highly sensitive to stimuli associated with punishment or non-reward (Gray, 1970b). But I hope 
-that enough has been said to convince the reader of the value of the general approach to anxiety 
of which this paper is a brief token. 
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A new look at the effects of noise upon performance 
E. C. Poulton 


A composite theory of human performance in noise 


Broadbent’s (1977) reply to the Myers lecture (Poulton, 1977 c) should be judged in the context 
of a new composite theory which appears to account for all the known effects of noise upon 
performance (Poulton, 1978 a). The four main determinants are: (1) Masking, both of acoustic 
cues and of inner speech (Poulton, 1977 b). (2) Distraction. (3) A beneficial increase in arousal 
when the noise is first switched on, which gradually lessens (Poulton, 1978 b, Table 1) and falls 
below normal to produce a decrement in performance when the noise is first switched off (Glass 
& Singer, 1972; Hartley, 1973 quoted in Broadbent’s point (g); Wohlwill, Nasar, De Joy & 
Foruzani, 1976). (4) Positive and negative transfer from noise to quiet. Positive transfer from 
the better learning of the task in noise under the influence of the increase in arousal (Poulton & 
Edwards, 1978). Negative transfer from the techniques of performance used in noise to 
counteract the masking or distraction, when they are not appropriate in quiet (Broadbent, 1958; 
Hartley, 1973 quoted in Broadbent's point (h); Jerison, 1959, Expt. 2). The reliable 
improvements, decrements, and unreliable effects of noise can all be predicted from these four 
main determinants, working either separately or together. 

On this composite view, Broadbent & Gregory’s (1963, 1965) reliable changes of confidence in 
noise are ascribed to the blurring of the distinctions between the criteria used in deciding on the 
categories of confidence, which results from the masking of inner speech in verbal working 
memory. The funnelling of attention in noise is ascribed to the increase in arousal produced by 
the noise. 


Broadbent’s 20 dials and 20 lights tasks 


Broadbent (1977) starts by discussing the acoustic clicks which my evidence indicates come from 
the controls of his 20 dials task (Broadbent, 1954), and which are masked by the noise (Poulton, 
1977 b). He gives five reasons why the clicks are of no help in quiet. (a) Responses to needle 
movements which the man actually sees, are not reliably delayed. But this negative result must, 
in my view, be due to the insensitive measure of performance used. The only information which 
Broadbent gives on seen movements is in his 1950 experiment in quiet. Here only about 
one-quarter of the needle movements are seen. This is too small a number to show a reliable 
effect of noise. (d) After the 1-5 h practice test in quiet without knowledge of results, other than 
the visual feedback following each response which is inherent in the display, the man never 
confuses the direction in which to move the control. (e) If he did so, he would break it. But this 
is not necessarily so. People often try to move controls in the wrong direction without breaking 
them. Also practice without knowledge of results can leave a man as confused at the end as he 
was at the beginning. It may simply encourage him to rely upon the click cue which he hears in 
quiet when he moves the control in the correct direction, but which is masked by the noise. 

(b) His 20 lights task (Broadbent, 1954) shows no overall reliable effect of noise. However, he 
claims a reliable increase in delayed responses in the noise on some parts of the display. Yet 
here the response is a straightforward one, which should not be upset by masking. He achieves 
reliability only by selecting (1) a post hoc criterion of 4-0 sec for delays, instead of the 
previously used criterion of 9-0 sec or any other reasonable time interval; (2) the second 
experimental periods in noise and in quiet instead of the first periods, or the combined first and 
second periods; and (3) the ten central lights instead of the ten peripheral lights, or the ten lights 
to one side or the other. These three post hoc selections increase the probability of finding a 
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reliable result perhaps 4x3x4 = 48 times, compared with his P value of about 0-025 or 1 in 40. 
Thus the increase in delayed responses is likely to be due to chance. Also in claiming this 
detrimental effect of noise, Broadbent (1977) does not mention the direction of the difference. In 
my view, on a strict theory of ‘funnel vision’ in noise, the ten central lights should if anything 
show a decrease in delayed responses, not an increase. 

(c) The effect of noise need not be due to masking the acoustic feedback from the response 
(Broadbent & Gregory, 1965). But here the reliable change in confidence reported in noise is 
likely to be due to the masking of inner speech (Poulton, 1978 b, Table 1). 


The five-cholce task of serial reaction 


Broadbent (1977) also gives four reasons why his reliable increase in errors on the five-choice 
task (Broadbent, 1953)-is not due to the partial masking of the pitch of the taps. The pitch is 
lower when the man misses a brass disc and hits instead the paxolin board in which the discs are 
mounted. (f) There is no reliable increase in gaps of 2-0 sec or longer between responses in 
noise. But it is seldom that any independent variable produces reliable changes on all possible 
measures of performance. Using the separated version of the five-choice task, instead of the 
integrated version used by Broadbent in noise, Hartley (1973, 1974; Hartley & Carpenter, 1974) 
reports reliable increases in gaps of 1-5 sec between responses in noise in all his experiments. 

(g) Men perform worse in quiet just after being exposed to noise (Hartley, 1973). But this is 
due to the fall in arousal below normal when the noise is first switched off. It has nothing to do 
with masking (Poulton, 1978 a). 

(h) Errors with short response times are found in Hartley’s (1973) experiment in quiet as well 
as in noise. In noise they occur when the man cannot see whether or not he hit the disc, and so 
hits the disc quickly a second time without checking to see whether the light has changed. When 
he hit the disc the first time, the second tap counts as an error to the next light with a short 
response time (Poulton, 1977 b). In quiet errors with short response times occur when this 
strategy transfers inappropriately from previous periods in noise (Poulton, 1978 a). Transfer is 
not, of course, 100 per cent. But this does not mean that the responses should take any longer 
when they do occur, as Broadbent suggests that they should. 

(ij) Wearing ear defenders reliably reduces the difference in gaps of 1-5 sec between noise and 
quiet during the first 20 min of the 40 min test (Hartley, 1974, Expt. 2). But this is because the 
arousal produced by the noise encourages the man to tap harder with ear defenders to make the 
taps sound as loud as they do without. The ratio of signal intensity to noise intensity is thus 
about the same in noise with ear defenders as it is in quiet with or without ear defenders. The 
compensation fails during the second 20 min as the level of arousal returns toward the resting 
level (Poulton, 1978 b, Table 1). 

Broadbent contrasts this result with Hartley & Carpenter’s (1974) failure to find a reliable 
change in the difference between noise and quiet when headphones are used instead of 
loudspeakers. He states that the difference between this result and the result above for ear 
defenders is not predicted by the changes in the acoustic cues. Yet all Hartley & Carpenter’s 
reliable differences, for both gaps and errors, fit the predictions (Poulton, 19785, Table 1). Some 
reliable differences which might have been expected on the basis of the predictions are absent, 
but it is rare that, in a complex experiment like Hartley’s, reliable results on all possible 
predictions are obtained. 


Other points ; 

Some of the most misleading statements of Broadbent (1976) are refuted in Poulton (1977 a). 
However, once accepted, a theory is not ousted by conflicting evidence, only by a better theory. 
It is submitted that the composite th ory of human performance in noise describes the data more 
comprehensively and more cleanly than Broadbent’s ‘three effects’ of noise. Readers are 
therefore advised to await the paper describing the composite theory in full (Poulton, 1978 a). 
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This rejoinder has been drafted five times to meet the requirements of the Editor and of his 
two impartial referees, who have held extensive discussions with both Donald Broadbent and 
myself. Some of the points which have not been accepted here will be found in the rejoinder 
published in The Psychological Bulletin (Poulton, 1978 b). 
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A further test of the golden section hypothesis 


J. Adams-Webber 


This research examines further the hypothesis that subjects tend to allot figures to the negative poles of 
constructs approximately 38 per cent of the time (Benjafield & Adams-Webber, 1976). Sixty Canadian 
undergraduates (30 women and 30 men) judged 20 nonsense words (e.g. JOHZAN) as if these were the 
names of people on 20 bipolar constructs. Ten of these constructs contained positive poles which were E+ 
and negative poles which were E- (e.g. kind-not kind) and the other ten constructs had positive poles which 
were E- and negative poles which were E+ (e.g. sad-not sad). These procedures were developed by Eiser 
& Mower White (1973). The results of this experiment, and a reanalysis of those of Eiser & Mower White, 
clearly supported the hypothesis. 


It has been shown repeatedly that when subjects categorize acquaintances in terms of bipolar 
dimensions, or ‘constructs’, consisting of pairs of contrasting adjectives (e.g. happy—-sad), they 
tend to allot these figures to the positive poles (e.g. happy) approximately 62 per cent of the time 
(Adams-Webber & Benjafield, 1973; Benjafield & Adams-Webber, 1975, 1976; Benjafield & 
Green, 1978). This particular proportion is known as the ‘golden section’ (or ‘golden mean’). 
The golden section is defined as the proportion between two values A and B whenever 

AIB = B/A+B. In order for this relation to obtain, A must be approximately 62 per cent (0-618) 
of B. Berlyne (1971) points out that this proportion is related to the concept of average 
information, as well as to Frank’s (1959, 1964) index of strikingness. 

Kelly (1955, 1969) assumes a priori that the probability that a subject will allot a figure to a 
given pole of a construct in his binary repertory grid test (described by Adams-Webber, 1970) is 
the same as the probability that he will allot that figure to the opposite pole (cf. Benjafield & 
Adams-Webber, 1975). If this assumption were correct, then the contribution of both categories 
of response to average information would be the same. That is, given an event with two possible 
outcomes, each with the same likelihood of occurring, the amount of potential information 
associated with that event will be equal to the logarithm (to the base 2) of 1/P,, where P, is the 
probability of each outcome, i.e. 4%. Thus, H (the amount of information in bits) = log 1/0-5 = 1. 
This is the maximum value of H for any dichotomous distribution (Garner, 1962). 

On the other hand, the evidence cited above indicates that the distribution which we can 
expect on empirical grounds is not 50-50, but rather approximately 62-38. Whenever the 
alternative categories of response have, as in this case, different relative frequencies, we must 
distinguish between the amount of potential information (uncertainty) associated with a single 
category of response and average uncertainty. We first determine the amount of uncertainty 
associated with each type of response separately, and then obtain a weighted average of these 
two values. More precisely, the Shannon—Wiener measure of average information is based on 
the sum of the separate estimates of uncertainty, each multiplied by its own probability of 
occurrence as a weighting factor: H = LP, log 1/P, (Attneave, 1959, p. 8). 

Frank (1959, 1964), summarized by Berlyne (1971), proposes that the ‘strikingness’ of an event 
can be measured in terms of its informational content, defined as log 1/P,, and its relative 
frequency of occurrence, P,. His specific index is obtained by multiplying these two values 
together, i.e. P, log 1/P,. It follows that the strikingness of an event as operationally defined by 
Frank is equivalent to its contribution to average information in terms of the Shannon—Wiener 
formula. The quantity P, log 1/P, reaches its maximum value when P, is approximately 0-368 
(Berlyne, 1971, p. 231), which is very close to the minor element in the golden section (0-382). 
Thus, as Berlyne (1971, p. 232) points out, the golden section ‘allows the minor element to 
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occupy that proportion of the whole that makes it maximally striking’. Benjafield & 
Adams-Webber (1976, p. 14) suggest, in the light of this relationship, that subjects tend to allot 
figures to the negative poles of constructs approximately 38 per cent of the time so as to render 
their negative judgements, taken as a whole, maximally striking. 

The results of an experiment by Eiser & Mower White (1973) support this ‘golden section 
hypothesis’. Sixty British schoolchildren judged 20 nonsense words (e.g. JOHZAN) as if they 
were the names of persons on the basis of 20 constructs. Each subject judged ten ‘names’ on 
constructs containing positive poles which were evaluatively positive (E+) and negative poles 
which were evaluatively negative (E—), e.g. happy~not happy. The other ten ‘names’ were 
judged on constructs with positive poles which were E- and negative poles which were E+, e.g. 
rude-not rude. This procedure was designed in part to test Boucher & Osgood’s (1969) 
‘Pollyanna hypothesis’, which asserts that there is ‘an universal human tendency to use E+ 
words more frequently and diversely than E— words’. This hypothesis predicts that subjects 
should allot more ‘names’ to the E+ poles of constructs; however, Eiser & Mower White found 
no significant difference between the number of E+ and E— responses. On the other hand, our 
own reanalysis of their data revealed that the children allotted exactly 38 per cent of the ‘names’ 
to the negative poles of constructs irrespective of whether they were E+ or E—. Although this is 
at best a post hoc interpretation, the observed proportion of negative responses is precisely what 
we should expect on the basis of the golden section hypothesis. Therefore, in the experiment 
reported below, Eiser & Mower White’s procedures were replicated in order to provide a further 
test of the golden section hypothesis. 


Method 


The subjects were 60 Canadian undergraduates (30 women, 30 men), aged 17-24, enrolled in an introductory 
psychology course Each subject was tested individually by the same experimenter, using the same procedures 
as Eiser & Mower White (1973). Subjects were presented with a list of 20 nonsense words (formed by 
combining pairs of CVCs from Noble’s (1961) list with m’ scores between 1-70 and 1-91) and asked to 
imagine that these were the names of real people whom they would be asked to describe. They were also 
given a list of 20 bipolar constructs, each of which consisted of a single trait adjective (positive response 
category) and the same adjective preceded by not (negative response category). A different ‘name’ was 
printed beside each construct, and the subject was asked to circle that pole of the construct which seemed to 
describe that ‘person’ better. On ten constructs the positive poles were E+ and the negative poles were E~, 
so that the subject’s responses could be classified as either PE+, e.g. ‘clever’, or NE~, e.g. ‘not clever’. On 
the other ten constructs, the positive poles were E— and the negative poles were E+, so that the subject’s 
responses were either PE—, e.g. ‘stupid’, or NE+, e.g. ‘not stupid’. 

There were two versions of this questionnaire, the second of which contained all the antonyms of the 
adjectives in the first. Half of the subjects (15 women, 15 men) were supplied with each version. Constructs 
were presented in a random order in both versions, and the order of the ‘names’ was rotated over subjects. 
The positive poles of constructs appeared to the left of the negative poles half the time, and the E+ poles 
also appeared to the left of the E— poles half the time. The ‘names’ and constructs in both versions are’ 
listed by Eiser & Mower White. 


Results 


The numbers of PE+, NE+, PE—, and NE- responses were determined separately for each 
subject. The means of these scores for all 60 subjects are reported in Table 1, together with the 
corresponding values for Eiser & Mower White’s data. The overall proportion of negative 
responses was 0-366 (s.D. = 0-13), which is quite close to both that obtained by Eiser & Mower 
White (0-380) and the minor element of the golden section (0-382). The difference between the 
total number of positive and negative responses was highly significant (z= 7-90, P< 0-001), as it 
was in the previous study (z = 5-59, P< 0-001). In neither experiment did the observed 
proportion of negative responses differ significantly from 38 per cent. 

As in Eiser & Mower White’s study, the distribution of E+ and E— responses was 
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Table 1. Comparisons between the results of the present study and those of Eiser & Mower 
White (1973) 


Present study Eiser & Mower White 
Response rs 
category % X % Xx 
PE+ 33 6°65 30 5-93 
PE- 30 6:02 32 6:47 
NE+ 20 3-93 18 3-53 
NE- 17 3-40 20 4-07 
Total P 63 12-67 62 12-40 
Total N 37 7-33 38 7-60 
Total E+ 53 10-59 47 9-46 


Total E- 47 9-42 53 10-54 


approximately 53—47. There were more E+ responses than E- ones in the present study 

(z= 1-76, n.s.); whereas there were more E— than E+ responses in the earlier one (z = 1-78, 
n.s.). Thus, the proportion of E+ responses does not differ significantly from 50 per cent in 
either set of data. In both studies the number of NE+ responses was about the same as the 
number of NE— responses. No significant differences were observed in either study between 
female and male subjects, or between the two versions of the questionnaire in terms of the 
proportion of either N or E+ responses. Cochran Q tests revealed no significant differences 
among ‘names’ in the numbers of N responses they elicited in either this experiment (Q = 13-73, 
d.f. = 19, n.s.) or the previous one (Q = 13-79; d.f. = 19, n.s.). 


Discussion 


These data provide very strong support for the hypothesis that subjects tend to allot figures to 
the negative poles of constructs approximately 38 per cent of the time (Benjafield & 
Adams-Webber, 1976). The fact that comparable results have been obtained with repertory grid 
tests and Eiser & Mower White’s questionnaire format, in which each figure is judged on only 
one construct and each construct is applied to only one figure, indicates that previous findings 
are not merely artifacts of repertory grid technique. It is also important that the golden section 
hypothesis seems to hold when constructs have been completely counterbalanced in terms of 
connotative meaning. Since only imaginary figures were used in this study, and no significant 
differences were found among these figures in terms of the number of times which subjects 
allotted them to the negative poles of constructs, it can be argued that the hypothesis applies to 
the way in which subjects use bipolar dimensions in general, and not just to how they construe 
their personal acquaintances. Finally, the generality of the golden section hypothesis is enhanced 
by the fact that comparable results have now been obtained using different measurement 
procedures and different sets of constructs with both schoolchildren and university students in 
each of two countries (Adams-Webber & Benjafield, 1973; Eiser & Mower White, 1973; 
Benjafield & Adams-Webber, 1975, 1976; Benjafield & Green, 1978). 


Acknowledgements 


This research was supported by a Canada Council Research Grant (S76-0379). The author is also grateful to 
Wayne Snider for his help in collecting data, and to John Benjafield for his many useful suggestions. 


442 J. Adams-Webber 


References 


ADAMS-WEBBER, J. (1970). An analysis of the 
discriminant validity of several repertory grid 
indices. Br. J. Psychol. 61, 83-90. 

ADAMS~-WEBBER, J. & BENJAFIELD, J. (1973). The 
relation between lexical marking and rating 
extremity in interpersonal judgment. Can. J. 
behav. Sci. 5, 234-241. 

ATTNEAVE, F. (1959). Applications of Information 
Theory to Psychology. New York: Holt. 

BENJAFIELD, J. & ADAMS-WEBBER, J. (1975). 
Assimilative projection and construct balance in 
the repertory grid. Br. J. Psychol. 66, 169-173. 

BENJAFIELD, J. & ADAMS-WEBBER, J. (1976). The 
golden section hypothesis. Br. J. Psychol. 67, 
11-15. 

BENJAFIELD, J. & GREEN, T. R. G. (1978). Golden 


section relations in interpersonal judgement. Br. J. 


Psychol. (in press). 

BERLYNE, D. E. (1971). Aesthetics and 
Psychobiology. New York: 
Appleton-Century-Crofts. 


BOUCHER, J. & Oscoop, C. E. (1969). The 
Pollyanna hypothesis. J. verb. Learn. verb. Behav. 
8, 1-8. 

Esser, J. R. & Mower Wuirte, C. J. (1973). 
Affirmation and denial in evaluative descriptions. 
Br. J. Psychol. 64, 399-403. 

FRANK, H. (1959). Grundlagenprobleme der 
Informationasthetik und erste Anwendung auf die 
mime pure. Quickborn: Schnelle. 

FRANK, H. (1964). Kybernetische Analysen 
subjektiver Sachverhalte. Quickborn: Schnelle. 

GARNER, W. R. (1962). Uncertainty and Structure as 
Psychological Concepts. New York: Wiley. 

Key, G. A. (1955). The Psychology of Personal 
Constructs. New York: Norton. 

KELLY, G. A. (1969). A mathematical approach to 
psychology. In B. A. Maher (ed.), Clinical 
Psychology and Personality: The Selected Papers 
of George Kelly. New York: Wiley. 


Received 5 January 1977; revised version received 18 March 1977 


Requests for reprints should be addressed to Dr J. Adams-Webber, Department of Psychology, Brock 
University, Region Niagara, St Catharine’s, Ontario L2S 3A], Canada. 


Br. J. Psychol. (1978), 69, 443-450 Printed in Great Britain 443 


When retrieval cueing fails 


Michael J. Watkins and Endel Tulving 





Does presenting a hint, or retrieval cue, for recall of an event change the memory trace for the event even 
when the cue does not in fact produce recall? An experiment by McLeod, Williams & Broadbent (1971) 
suggests that it may. A conclusion to this effect would have important theoretical implications. In particular, 
it would pose difficulties for specifying trace structure. McLeod et al. observed that a retrieval cue was more 
effective if its target trace had been previously cued, even though this cueing did not elicit recall. Three 
experiments are described which indicate that this result occurs only if the first, ineffectual cue is presented 
along with the second cue; if the second cue is presented alone it is less effective than the first. It is 
concluded that there is currently no evidence that the unsuccessful cueing of an item causes a change in its 
memory trace. 


Being reminded of an event is likely to increase the ease with which the event can be recalled on 
subsequent occasions. But suppose we are given a reminder or retrieval cue that fails to elicit 
recall of the event: is such unsuccessful cueing also likely to change the probability of later 
recall? 

An unsuccessful attempt to cue the recall of an event probably has less effect on the memory 
trace of the event than does a successful attempt. It is less clear, however, whether 
unsuccessfully cueing an item has any effect on its trace. This question is of interest for a 
number of reasons, including its implications for determining the structure of memory traces. 

In an earlier article (Tulving & Watkins, 1975), we suggested that the memory trace could be 
usefully described in terms of the effectiveness of, and relations between, different classes of 
retrieval cues. The technique which we proposed for deriving this description - the reduction 
method — rests firmly on the assumption that a cue that does not effect retrieval does not cause a 
change in the trace. Given this assumption, a second cue can be applied to one and the same 
trace. By varying the order in which different classes of cues are applied, and by adopting a 
subtractive logic, the relation between the different classes of cues can be determined without 
recourse to data for items that had previously been recalled and whose traces may therefore 
have been changed. Such cue relations constitute a comparatively rich description of trace 
structure. 

We shall not recount the details of this technique here, for it is sufficient to note that 
describing trace structure will be a very much more complicated affair if we do not make the 
assumption that unsuccessful cueing leaves no durable effects on the target trace.* Thus, even 
for this reason alone, the question is of some importance. Unfortunately evidence on this issue 
is scant. There is, however, one study that does seem to be relevant. Its results appear to 
indicate that a trace is in fact changed as a result of unsuccessful cueing. The study is one 
reported by McLeod, Williams & Broadbent (1971). The purpose of the present study is to 
propose, and to provide some evidence for, an alternative explanation of these findings. 

McLeod et al. presented subjects with a list of words and subsequently presented three 
successive tests, the first a free-recall test, the others cued-recall tests. In the first cued test, a 
cue word was presented for each of the target words that had not been recalled in the free-recall 
test. Each cue was a word associatively related to its target word. Thus for the target word 


* The effects we are referring to are, of course, item specific effects. There can be little doubt that cueing, 
whether or not it produces recall, will leave a small non-selective effect. The assumption under question is 
that an ineffectual cue will affect its target trace no more nor less than it will affect the other, potential 
targets. 
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WHISTLE the cue word was blow. In the second cued test cues were given for only those target 
items that were recalled in neither the free-recall test nor the first cued test, and like the first 
test the cues were associatively related to the targets. Thus, if WHISTLE had been recalled in 
neither the free-recall test nor the first cued test, the second cued test would include the word 
train added to the unsuccessful cue blow. McLeod et al. found not only that the second-test 
cues facilitated further recall, but also that they elicited recall with a higher probability than did 
the first-test cues. It would seem, therefore, that the unsuccessful cue shown in the first test 
changed the trace, and in particular changed it in a way that increased the likelihood of recall in 
response to the second cue. If so, then our method for describing trace structure (Tulving & 
Watkins, 1975) is in trouble. 

There is, however, an alternative interpretation of the McLeod et al. findings. The higher level 
of recall observed in the second cued test could be due not to the prior unsuccessful cueing in 
the first cued test, but rather to the fact that the ineffectual cue of the first test was presented 
along with the new cue of the second. This interpretation is consistent with the assumption that 
unsuccessful cueing leaves the target trace unchanged. 

This alternative interpretation can be tested simply by comparing second-test recall (of 
previously unrecalled items) under two conditions: (a) with the ineffectual cue of the first test 
re-presented along with the new cue, as was done in the McLeod et al. study; and (b) with the 
new cue presented alone. If we are correct in attributing the higher level of recall in the second 
test in the McLeod et al. experiment to the presence of the ineffectual first-test cue then, 
obviously, performance should be reduced if this cue is omitted and the second-test cue 
presented alone. Furthermore, performance in the second test should no longer be superior to 
performance in the first test. On the other hand, if the greater effectiveness of the second cue is 
due solely to the prior presentation of the first cue, then the presence of the first cue in the 
second test should be of no consequence. 

These alternative predictions were tested in three experiments. The first involved just two 
successive cued-recall tests and no free-recall test, the second and third included a free-recall 
test prior to the cued tests. 


Experiment I 

The procedure followed closely that of McLeod et al. (1971). After being tested for free recall 

of a practice list, subjects studied a critical list and took two successive cued tests. In the first of 
these each target was cued with a single word; in the second the first cue was either replaced by, 
or shown together with, a second cue. 


Method 7 


The materials — both the study words of the practice and critical lists, and the cue words of the critical list - 
were the same as those used by McLeod et al. (1971);* they are shown in the Appendix. The subjects were 
40 Yale University undergraduates who were tested in groups of from two to four persons. 

All subjects were first given some practice in free association (in what was ostensibly an unrelated 
experiment) with stimulus words that were unrelated to those used in the experimental lists. They then 
studied and attempted to recall first the practice list and then the critical list. Both lists were shown by 
means of a projector one word at a time and at a rate of 5 sec per word. A 30 sec arithmetic distractor task 
was given immediately following presentation of each list. The test for the practice list was one of free 
recall. For the critical list the subjects had been led to expect a free-recall test but they were given instead 
two successive cued tests. 

There were two between-subject conditions, with 20 subjects serving in each. For the first cued test the 
two groups were treated alike. Subjects were given a test sheet which included one associative cue word for 
each of the 25 target words. The relation of the cues to the targets was explained and subjects wrote, at their 
own pace, ag many target words as they could remember. For the second test all subjects were given a fresh 
sheet which, like the first, included 25 cues, one for each target. The nature of the cues, however, differed 


* We are indebted to Peter McLeod for making these materials available. 
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for the two groups of subjects. For subjects in the double-cue condition, each cue comprised two words, one 
the cue word used in the first test, the other a new word. For subjects in the single-cue condition, each cue 
consisted of just the new word. Again, the subjects responded in their own time. Within each condition the 
allocation of cues to the two tests was balanced between subjects. 

It will be noted that the testing procedure differed from McLeod ef al. in that all target words were cued in 
both tests. The reason for this discrepancy, however, was merely to simplify the procedure. As in the 
McLeod et al. study, interest will focus on the probability of second-test recall of those items not recalled in 
the first test, P(R,/.R,); and on the relation of this conditional probability to the probability of recall in the 
first test, P(R,). 


Results and discussion 


A complete summary of the results is provided in Table 1. The data of interest, the P(R,) and 
P(R,|R,) scores for the two conditions, are shown in Table 2. The between-condition difference in 
P(R,) scores must be attributed to sampling variability, since treatment was identical up to this 
point — and indeed, the difference was not significant, t< 1. From our interpretation of the 
McLeod et al. findings, the single-cue condition should show P(R,|R;) to be no higher than P(R,): 
in fact, it was significantly lower, t = 5-52, d.f. = 19, P< 0-01. A somewhat stronger test of our 
views is afforded by our prediction of an interaction between condition and test such that, 
relative to their respective P(R,) scores, the single-cue condition should yield a lower P(R,|R;) 
score than the double-cue condition. This proved to be the case, and the effect was significant, 
t= 2-38, d.f. =38, P< 0-02. This effect indicates that the re-presentation of the ineffectual 
first-test cue does indeed enhance performance in the second test. 

Although in the second test subjects in the double-cue condition fared rather better than 
subjects in the single-cue condition, they did not score as well in this test as they had in the first 
test. In this respect our data differ from those of McLeod ef al. One possible reason for this 
discrepancy is obvious. McLeod et al. interpolated a free-recall test prior to the first cued test, 
and they considered only those items not produced in free recall. If we make the not 
unreasonable assumption that items that can be retrieved under free-recall instructions will have 
a relatively high probability of being recalled in each of the cued tests, then our inclusion of 


Table 1. Summary of data: Expt. I 


Second cued test 


Single-cue condition Double-cue condition 
First ___ 
cued Not Not 
test Recalled recalled Total Recalled recalled Total 
Recalled 177 71 248 177 29 206 
Not recalled 46 206 252 100 194 294 
Total 223 277 500 277 223 500 


Table 2. Mean first test, P(R,) and conditional second test, P(R,|R,), recall probabilities 
Condition P(R,) P(R,|R}) 


Single-cue 0-496 0-182 
Double-cue 0-412 0-340 
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these items should serve to boost P(R,); on the other hand, though their inclusion should also 
boost P(R,), it need not increase P(R,|R;), or at least not to the same extent. 

Our assumption (Tulving & Watkins, 1975) that an ineffectual cue leaves no durable effect on 
the target trace should, however, hold for all items, including the subset that cannot be retrieved 
under free-recall instructions. If, like McLeod et al., we restrict consideration to such a subset, 
P(R,) should be reduced but, provided the second cue is presented alone, it should be no lower 
than P(R,|R,). On the other hand, when the ineffectual first-test cue is re-presented with the new 
cue in the second test we should presumably replicate McLeod et al.’s results and find P(R;) to 
be lower than P(R,|R,). These predictions are tested in Expt. II. 


Experiment II 

Method 

The method was the same as that of Expt. I except for the addition of a free-recall test for the 
critical list. This test was given immediately after the distractor task (thus fulfilling subjects’ 
expectations); no time limits were imposed. As before, the subjects were 40 Yale 
undergraduates, with 20 serving in each condition; none, of course, had participated in Expt. I. 


Results and discussion 


A detailed summary of the results is given in Table 3. For the present purposes we shall restrict 
consideration to the P(R,) and P(R,|R,) scores, shown in Table 4. Also shown in Table 4 are the 
corresponding scores for the McLeod et al. experiment. 

Consider first the cued-recall performance without regard to free recall. These results, shown 
in the left half of Table 4, replicate those of Expt. I (Table 2). Thus for the single-cue condition, 
P(R,) is considerably, and significantly, greater than P(R,|Rj), (t= 6-50, d.f. = 19, P<0-01); 
whereas in the double-cue condition the difference is in the same direction but smaller and not 
significant, t= 1-12, d.f. = 19, P>0-10. As in Expt. I the interaction between condition and cued 
test was significant, t= 2-62, d.f. =38, P< 0-01. 

Consider now cued recall for the subset of items not given in the free-recall test. Consistent 
with our speculations, the effect of this restriction was to reduce P(R,) relative to P(R;|R,). 
However, the predicted interaction between condition and cued test remained, t= 2-35, 

d.f. = 38, P< 0-05. Also, in the single-cue condition, P(R,) remained higher than P(R,|R;), 
t=3-52, d.f. = 19, P<0-01. For the double-cue condition, on the other hand, ignoring items 


Table 3. Summary of data: Expt. II 


Second cued test 


Free- First Single-cue condition Double-cue condition 

recall cued mccc aeaaea 

test test Recalled Not recalled Total Recalled Not recalled Total 

Recalled Recalled 146. 32 178 146 10 156 
Not recalled 29 24 53 35 27 62 
Total 175 56 231 181 37 218 

Not recalled Recalled 62 24 86 62 8 70 
Not-recalled 31 152 183 66 146 212 
Total 93 176 269 128 154 282 

Total Recalled 208 56 264 208 18 ' 226 
Not recalled 60 176 236 101 173 274 


Total 268 232 500 309 191 500 


+ 
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Table 4. Mean first test, P(R,), and conditional second test, P(R,|R,), recall probabilities for the 
single-cue and the double-cue conditions in Expt. IL, and for the McLeod et al. (1971) experiment 





Items not given in 





All items free recall test 

P(R;) P(R:| R) PR) P(R:|R;) 
Single-cue condition 0-528 0-254 0-320 0-169 
Double-cue condition 0-452 0-369 0-248 0-311 
McLeod et al. = — 0-341 0-434 


(double-cue) experiment 





‘given in the free-recall test resulted in a higher P(R,|R,) than P(R,) score. As is evident from 


Table 4, the double-cue results resemble closely those of McLeod et al.: P(R,|R;) exceeds P(R,) 
by 25 per cent in our double-cue condition, and by 19 per cent in the McLeod et al. experiment. 


Experiment II 

The purpose of Expt. III was to check the findings of Expt. II using a procedure which 
resembled more closely that of McLeod et al. Specifically, the cued tests in Expt. III did not 
include cues for targets recalled in a previous test, since it is conceivable, perhaps, that the 
critical conditional recall probabilities derived in Expt. II could have reflected the fact that all 
targets were cued in both cued tests. 


Method 


The procedure was essentially the same as that for Expt. II, except that once a target had been 
recalled, either in the free-recall test or the first cued test, it was not cued in a subsequent test. 
The subjects were 24 undergraduates, 12 serving in the single-cue condition, and 12 in the 
double-cue condition. In order to simplify the administration of the tests, the cues were given 
orally (in the subject’s own time); and, of course, all subjects were tested individually. 


Results 


The results were essentially the same as those of Expt. II. The proportion of items produced in 
the free-recall test was 53 per cent in the single-cue condition, and 59 per cent in the double-cue 
condition. More important are the cued-recall data. In the single-cue condition, P(R,) = 0-314 
whereas P(R,) = 0-128; this difference was statistically significant, t = 2-809, d.f.=11, P< 0-02. 
In the double-cue condition, P(R,) was less than P(R,) (0-281 vs. 0-328) but not significantly so, 
t= 1-170, d.f. = 11. The interaction between condition and cued test was significant, t = 2-822, 
d.f. = 22, P<0-01. 

In short, the main findings of Expt. II replicate those of Expt. II, and therefore indicate that 
the latter findings were not dependent upon a procedure in which successive tests involved all 
target items. 


Conclasion 
We conclude from this study that the findings of McLeod et al. (1971) do not necessitate the 


- view that an unsuccessful attempt to cue a target item causes a change in the target trace. 


McLeod ef al. found that the probability of prompting recall of an item for the first time was 
greater with a second cueing attempt than the first, even though the two cues were normatively 
equivalent. Our data, however, have shown that this result occurs only if the ineffectual cue of 
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the first test is re-presented along with the second cue. When the second cue is presented alone 
it is less effective than the first. It seems, then, that we can retain our assumption (T ulving & 
Watkins, 1975) that an ineffectual retrieval cue leaves no lasting effect on the target trace. 

At the same time, our results do not prove that ineffectual cueing has no lasting effects on the 
target trace: they merely indicate that the McLeod et al. data do not demand such an 
interpretation. Thus it could be that the first cue did leave a lasting effect but our procedure was 
not sensitive enough to detect it. Indeed, there are at least two reasons for supposing our 
procedure to be insensitive. (a) Whether or not cues cause significant changes in their respective 
target traces, there is little doubt that they have small but cumulative, and durable, non-selective 
effects (see footnote on p. 443). Since more of this test-produced interference will have occurred 
by the second test of an item than by the first, cueing in the second test will be at a 
disadvantage. (b) Several sorts of item-selection effects are possible, and each could work 
against the second cue being more effective than the first. For instance, the Bilodeau & Howell 
(1965) norms indicate a positive, though small (r= 0-14), correlation between the two sets of 
cues of the McLeod et al. material. Perhaps more important are the variations in ‘strength’ of 
the traces. A subject may attend particularly well to a certain study word — perhaps because it is 
the first item in the list, or because of some entirely idiosyncratic reason — and this superior 
study will be reflected in P(R,) but not necessarily in P(R,|R,). These item-selection effects will 
be aggravated by the use of two sets of retrieval cues that are of the same type ~ namely, 
associative. True, problems due to item variation will be reduced by restricting consideration to 
items not given in a prior free-recall test, but the variation will be merely truncated, not 
eliminated. 


Some comments on the effect of re-presenting ineffectual cues 


Consider now the finding that second-test cues were more effective in the presence of ineffectual 
first-test cues. There is currently insufficient evidence to warrant a confident explanation of this 
effect, but three possibilities suggest themselves. 

First, appeal can be made to demonstrations that an item that cannot be recalled under a given 
set of conditions may be subsequently recalled under nominally identical conditions. This 
phenomenon (usually referred to as reminiscence) is well established with, for example, 
free-recall and paired-associate procedures, and it may well occur with the extra-list cueing 
procedures under discussion. If so, a cue that was ineffectual in the first test would with some 
probability prove effective in the second, and thus recall would be higher in the double-cue than 
in the single-cue condition. Though we cannot rule out such an explanation, reminiscence effects 
are typically rather modest, and we feel that they would probably not be sufficient to account for 
the large effects observed in the present experiments. 

A second approach to explaining the beneficial effect of re-presenting an ineffectual cue along 
with a new cue is to assume that each cue tends to bias the encoding of the other. For instance, 
while the first-test cue blow may fail to effect the recall of the target word-event WHISTLE, its 
re-presentation along with a new cue train may cause either or both cues to be encoded in a way 
more compatible with the target trace than would be the encoding of the cue train presented 
alone. McLeod ef al. took the precaution of choosing their cues such that they did not occur as 
free-association responses to each other with P> 0-01; however, while this precaution may have 
reduced any mutual encoding bias, there is little reason to suppose that it eliminated it. 

McLeod et al. were aware of this possibility and conducted a control experiment to test it. 
Specifically, subjects who had not been shown the target list were given one cue word for each 
target word with instructions to free associate. They were then given the same instructions but 
with each target now cued with both cues, presented as a pair. McLeod et al. found that the 
probability of producing the response to the two cues was significantly fess than that for the 
single cue. This result is apparently contrary to the notion that the second cue is encoded more 
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appropriately in the presence of the first cue. This argument is not, however, entirely 
convincing. It could be that the advantage of an appropriately biased encoding is manifest in 
episodic memory but not in semantic memory. Moreover, McLeod et al.’s control experiment 
does not rule out the possibility of some beneficial encoding bias even in semantic memory; the 
effect could be masked by the differential interference and item-selection effects mentioned 
earlier. 

A final possibility for explaining the greater effectiveness of a new cue when presented in the 
company of a previously ineffectual cue is to assume that even an ineffectual cue makes contact 
with the trace and causes a transitory change in it, a change that enhances the probability of a 
new cue being effective. This change, however, being short-lived, would not be manifest if the 
second cue were temporarily displaced, as in our single-cue'condition. This view is basically 
similar to the logogen model of Morton (1969). 

We have discussed three possible factors that could underlie the beneficial effect of adding a 
previously ineffectual cue to a second cue, and no doubt others could be proposed. While 
evaluation of these factors must await further research, our own hunch is that more than one is 
involved, ' 
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Appendix 
Materials used in Expts I and II, and in McLeod et al.'s (1971) experiment 


Critical list 

Practice 

list Target words Cue words 

Clock Whistle Blow 0-01* Train 0-06 
Ripe Head Hair 0-16 Shoulder 0-05 
Bird Chocolate Milk 0-03 Cake 0-08 
Deep Square Round 0-36 Cube 0-31 
Yellow Tall Small 0-02 Long 0-03 
Signal Mind Memory 0-19 Thought 0-12 
Loud Work Play 6-03 Labour 0-70 
Tomato Blue Sky 0-70 Green 0-11 
Sweet Hand Foot 0-10 Finger 0-37 
Car Pork Meat 0-03 Lamb 0-02 
Hollow Smooth Silk 0-11 Rough 0-31 
Fire Wet Dry 0-55 Water 0-21 
Tiny Child Baby 0-21 Kid 0-23 
Bottle Black White 0-63 Dark 0-16 
Want Music Sound 0-07 Song 0-17 
Wide Hard Easy 0-64 Rock 0-25 
Hammer Stomach Ache 0-01 Food 0-02 
Book Cold Hot 0-74 Ice 0-47 
Nice King Ruler 0-28 Crown 0-43 
Ear High Low 0-69 Mountain 0-18 
Piano Flower Bud 0-38 Blossom 0-65 
Kind Girl Woman 0-14 Friend 0-09 
Grass Slow Quick 0-25 Swift 0-26 
Father Table Chair 0-23 Cloth 0-02 
Window Light Bright 0-20 Bulb 0-83 


* These figures are the probabilities, according to the Bilodeau & Howell (1965) norms, of the cue words 
effecting production of the target words in discrete association. 
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Measurement of categorical clustering in free recall 


Richard T. White and Mavis E. Kelly 





The problem of whether one can assume a monotonic relationship between the construct, tendency to 
organize verbal stimuli in free recall, and current clustering indexes is discussed in the light of Colle’s 
criticism that chistering indexes have no theoretical basis. It is argued that there is a primitive theory which 
assumes that clustering measures increase as a function of increases in the construct. Consideration is also 
given to Shuell’s claim that choice between indexes depends on the psychological and statistical assumptions 
one is prepared to accept. This claim is endorsed but it is argued that it is possible to distinguish between 
indexes which share the same assumptions by consideration of the basic properties which they need to 
possess in order to be suitable research instruments. 


The tendency to organize verbal stimuli into identifiable groups is a psychological construct 
which is considered an important mechanism for overcoming storage limitations in parts of the 
human memory system (see Mandler, 1967), The experimental paradigm used to study this 
construct has not changed significantly since Bousfield’s (1953) original research. Typically, 
participants are required to free recall as many words as possible from a stimulus list consisting 
of randomly presented words which the investigator sees as falling into a small number of 
superordinate categories. Recall lists are inspected for evidence that participants are clustering 
words from the same category. The number of repetitions in the list, i.e. the number of times a 
word is followed immediately by another word from the same superordinate category, is the 
basic component of this evidence. 

Although the experimental paradigm has remained stable and is widely accepted, several 
different indexes for quantifying recall list evidence have been employed in order to measure the 
amount of the construct possessed by individual respondents. Researchers are now faced with a 
bewildering array of indexes and an almost equally varied set of criteria on which to compare 
them. In consequence, they often appear to choose an index arbitrarily for their analyses, or at 
least rarely state reasons for their choice. Although the merits and limitations of particular 
indexes have been actively debated (e.g. by Dunn, 1969; Hudson & Dunn, 1969; 
Dalrymple-Alford, 1970, 1971; Frankel & Cole, 1971; Roenker, Thompson & Brown, 1971; 
Frender & Doubilet, 1974), researchers seem to have gained little from the points raised, perhaps 
because most protagonists have concentrated on minor differences between basically similar 
indexes and have ignored more important questions concerning the rationale for their use. Colle 
(1972) and Shuell (1975) have broadened the scope of the debate in different ways; their 
arguments must be considered as the first step in assisting researchers to choose indexes 
rationally rather than arbitrarily. 


Colle: Hypothesis testing and parameter estimation in measurement of clustering 


Colle’s (1972) arguments are directed against the procedure of equating measures of amount of 
clustering in individual free recall lists directly with the amount of the construct, tendency to 
organize verbal stimuli, present in the individual. He claims that measurement of tendency to 
organize requires the construction of a clustering scale and adds that confusion arises when 
researchers fail to realize that ‘the solution to the scaling problem can be obtained only by 
stating a correct scaling theory’ (p. 625). Such a theory would describe the mechanism producing 
clustering and thus clarify the relation between the psychological construct and the experimental 
outcomes, As a minimum requirement, the theory should include a function relating a parameter 
representing the quantity of the construct to the expected properties of the recall list that might 
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be produced by an individual. The clustering scale would be based on this parameter. For 
purposes of illustration only, Colle (p. 631) suggested a simple function. In reality a much more 
detailed function might need to be developed, probably including several parameters in addition to 
that representing the degree to which individuals tend to organize verbal stimuli in free recall. 
There could be a parameter to account for variations in human memory storage capacities; one 
to allow for the degree to which the respondent’s ability to recall is affected by the serial order 
of the stimulus list; others to allow for the ability of the respondent to deal with variations in the 
properties of the stimulus list such as number of categories, number of words per category, types 
of category (e.g. concrete vs. abstract), and degree of intra-category item association. The 
theory would then effectively consist of a function relating this complex system of parameters to 
the expected values of observable properties of the recall list, such as number of words recalled 
from each category and number of repetitions, for each respondent. 

The search for a model of this type is a noble quest, but there is reason to continue to engage 
in present research methods as well. Colle’s procedure faces a formidable practical obstacle in 
the difficulty of obtaining enough data to make a powerful test of the accuracy of any one 
model, or for comparing one model with another. Since the value of the parameter (or worse, 
parameters) can vary from one person to another, estimates must be obtained separately for 
each person. In testing all but the most wildly inaccurate models this will mean that several 
recall lists may have to be obtained from each person in order to reduce the error of estimate of 
the parameter. Several recall lists will be essential where the model contains parameters for the 
effect of variations in properties of the stimulus list. Until we invent experimental techniques to 
reduce random variation in a respondent’s performance, it will be difficult to prove that one 
model is superior to another and it may not be possible to obtain unequivocal support for a 
single model. Despite its potential power and theoretical attraction, Colle’s recommended 
research procedure may currently be less useful than the present, theoretically less defensible, 
methods of determining what variables affect clustering performance. 

Concern for the practicality of Colle’s procedure does not counter his claim that current 
clustering indexes do not constitute scales of measurement and at best are suitable only for 
testing whether clustering has occurred at more than chance level. The general question he raises 
is whether measures which may be used in testing the null hypothesis of no tendency to organize 
can also be used to provide a scale of the degree to which the construct is present. 

Despite Colle’s arguments, there seems to be no reason why a particular index could not 
provide a scale of tendency to organize, if it is based on assumptions (additional to the basic 
statistical assumptions needed for testing the null hypothesis) about the relation between the 
construct and the index. It is possible to argue that the basic notion underlying all current 
indexes which purport to measure tendency to organize verbal stimuli is that the number of 
repetitions in the recall list is a function of the construct. Bousfield (1953, p. 229) clearly held 
this assumption: ‘The theoretical significance of this undertaking derives in part from the 
assumption that clustering is a consequence of organization in thinking and recall’. 
Unfortunately this is a primitive notion which omits details of how the construct might 
determine other recall list properties such as total number of words recalled and number of items 
recalled from each category, as well as how this relation is affected by the properties of the 
stimulus list. These of course are the details which would be provided by a ‘correct’ scaling 
theory, such as Colle advocates. 

It is observed that many other fields of psychology have operated, apparently usefully, 
without detailed models of the relation between a construct and observable behaviour. 
Presumably this has been necessary because the construct itself is complex or because the 
variables which might affect it are either not known or not measurable with the techniques 
available. Intelligence scales, for instance, rest on the notion that there is a monotonic increasing 
relation between the construct of intelligence and the number of items correct on a given test. 
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Support for continued use of such scales lies in their usefulness in predicting performance in 
other spheres of activity. Naturally it would be desirable to have a detailed mathematical model 
relating intelligence to test performance, but less detailed models are useful in the case where 
appropriate theoretical models have not been developed. We are currently in very early stages 
indeed in investigating a number of cognitive constructs and need to recognize that the 
development of measurement and theory are ongoing and interrelated processes. It is not the 
case that useful measures are developed only at advanced stages of theory construction. 
Naturally, current clustering indexes are imprecise tools for basic research on human memory 
processes, but in those areas of research where the interest lies in comparing groups or 
evaluating single groups in terms of their tendency to organize, these indexes may well have a 
useful role. For these reasons it seems preferable to recommend that the type of research which 
Colle advocates should be undertaken in addition to, not instead of, current methods of 
investigation. 


Shuell; Assumptions underlying measures of clustering 


In contrast to Colle, Shuell (1975) has kept his comments within the bounds of the primitive 
model which lacks any details apart from, all other things being equal, a greater number of 
repetitions reflects a greater tendency to organize. Shuell argues that debate about which index is 
best is futile, since the answer depends on the purposes of the researcher and the concomitant 
psychological and statistical assumptions he is prepared to accept. This argument should bring 
some order to the jumble of available clustering indexes, but taken alone it does not aid the 
practical researcher a great deal since, as will be shown subsequently, all but a few clustering 
indexes rest on the same psychological assumptions. In order to clarify the situation further, we 
propose to go beyond Shuell’s recommendation that researchers should be aware of their basic 
assumptions when choosing an index, by giving consideration to the ways in which indexes may 
be used in research. Assumptions underlying the indexes are discussed and each index is 
examined to see whether it does have the properties which might be deduced from these 
underlying assumptions. Where a set of indexes is found to have basically identical assumptions, 
criteria on which they may be compared are suggested. 


Uses of clustering indexes 


We can identify four uses of clustering indexes. Consideration of these uses leads to deduction 
of four properties that indexes may need to possess in order to be suitable for empirical 
research. 


Use | 


The value of the index for a single recall list may be used to decide whether the respondent 
shows any evidence of tendency to organize verbal stimuli. Although no examples of this use 
appear to have been published, it may be of interest in clinical or diagnostic situations. This use 
requires estimation of the probability of a particular response occurring if the respondent were 
making no attempt to cluster the words in the recall list into the categories predetermined by the 
researcher. Therefore the index must have a known probability function under the condition that 
the respondent is making no attempt to cluster. 


Use 2 


The values of the index for the responses of members of a sample are used to decide whether 
the population from which the sample was drawn would cluster the word list at a greater than 
chance level. Examples of this use are found in Rossi & Rossi (1965) and Moely, Olson, Halwest 
& Flavell (1969). This is clearly a case of statistical null hypothesis testing. To be fit for this use, 
an index must have an expected value that is the same for all members of the sample if they do 
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not possess the construct in any degree, that is they have no tendency to organize verbal stimuli 
in recall; and this expected value must be the same for all respondents despite variations in 
numbers of words and their distribution among categories in the recall lists. The null hypothesis 
can then take the form that the population has a mean value of the index which is equal to the 
expected value. For a sign test to be applicable to the null hypothesis, the index must, where 
there is a tendency to organize, be consistently greater (or consistently less) than the expected 
value. If a more powerful non-parametric test such as the Wilcoxon test is to be applied, there 
must be the further property of a monotonic relation between the index and the construct, again 
irrespective of variations in the number of words in the recall lists. (Note that it is the kernel of 
Colle’s objection to current indexes that a monotonic relation between the index and the 
construct has not been demonstrated.) For application of the parametric t test the index should 
in addition be distributed normally. 


Use 3 


The values of the index for the responses of members of several samples are used to judge 
whether the populations from which they were drawn possess different mean degrees of the 
construct. This use includes investigations of the effect of variations in type of stimulus list as 
well as variations in participants’ populations. Examples are Shultz, Charness & Berman (1973), 
Thompson & Roenker (1971) and Wachs & Gruen (1971). In this case, the statistical null 
hypothesis is that the mean values of the index are equal in all populations. To allow a 
non-parametric test to be applied there must be a monotonic relation between the construct and 
the index; and for a parametric test, either ¢ test or analysis of variance, the index should be 
normally distributed. However, it is not necessary to know the expected value of the index under 
conditions of no tendency to organize verbal stimuli. 


Use 4 


The correlation between the index and other variables such as intelligence test scores, age, ` 

amount of pre-training, and so on, may need to be known. Examples are not prevalent in the 
literature. Again there must be a monotonic relation between the construct and the index, and 
preferably a normal distribution of the index. 


Notation 


Before we discuss assumptions underlying indexes, it will be convenient to set out a system of 

notation. Unfortunately no standard system has been used throughout the development of, and 

controversy over, clustering indexes. This has not aided practical researchers in overcoming the 
confusions of the present situation. The system used here owes most to Mood (1940) and Dunn 
(1969). 


Stimulus list 


x, number of categories 

p, total number of words 

pı, number of words in category i 

{p,}, the set of values of p,, i=1, ..., x. 


Recall list 


k, number of categories k <x 

r, total number of words r< p 

r,, number of words in category j 

{r,}, the set of values of 7, j=1,..., k 
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S, number of repetitions, i.e. the number of times a word of the jth category is followed by 
another word of the same category, summed over all j. 

T, number of runs, i.e. the number of sequences of words from the same category bounded by 
words from different categories or the ends of the list. 

It can be shown that S+ T=r. For example, suppose the stimulus list consists of four categories 

with ten words in each category, and the recall sequence is AABAAACCCBD, then, x = 4, 

p=40, {p,} = 10, 10, 10, 10 

k=4, r=11, {r}=5, 2, 3, 1 

T=6, S=5 


Postulate and probabilities 


Ps: the postulate that none of the members of a population possess the construct, tendency to 
organize verbal stimuli, to any degree. 

p(T| {rj}, Py): the probability of an observed or smaller value of T occurring when P, is true, for 
a particular set of r. 

p(T\|r, Py): the probability of an observed or smaller value of T occurring when P, is true, for a 
particular value of r. 

Note that p (T|{r,}, Po) =p (Sl{r}, Po) and p (TIr, Po) = p (S|r, Py), where the S probabilities are 
those of observed or greater values of S. 


Indexes and underlying assumptions 


Eleven clustering indexes are defined below. This set includes those most widely used in the past 
20 years, the ones currently most popular, and some important recent contributions. It is argued 
that the indexes fall into three groups which differ in terms of the psychological model used to 
describe the relation between the construct and the index. The first group consists of a single 
index, S. The assumption underlying this index is that the construct affects the number of 
repetitions and the number of words in the recall list. The second group (indexes 2 to 10) is 
based on a model in which the construct is held to affect the number of repetitions for a given 
recall list, but the relation between the construct and both number of words recalled and 
distribution of words among the categories is unspecified. The third group consists of the last 
index, Dy, where the relation between the construct and the number of words recalled is not 
specified, but for a given number of words the construct is held to affect both the number of 
repetitions and the distribution of words among the categories. 

1. S. The number of repetitions in the recall list, this is the basis of the original clustering 
indexes developed by Bousfield (1953). S (or its complement, T) remains the basis of all other 
clustering indexes. The fact that larger S values are more likely to occur with larger r values 
need not be a barrier to its use as an index. If the construct is held to determine both r and S|r, 
then S is a valid, though coarse, measure of the construct. 

2. LR= S/r. Bousfield (1953) proposed this index which may appear to be a simple, logical 
modification of S in order to correct for the greater possibility, and even likelihood, of larger S 
values with larger values of r. However, the addition of r as a divisor reflects a change in the 
psychological model. Whereas S implies that the construct influences positively both the number 
of repetitions and the number of words recalled, LR is based on a model in which the construct 
is held to influence positively the number of repetitions, but in which the relation between the 
construct and number of words recalled is left unspecified. 

3. RR = S/(r—1). Known as the Ratio of Repetition, this was used by Cohen & Bousfield 
(1956). It differs from LR by the use of r—1 in the denominator instead of r. This change 
appears to be a sensible, ad hoc modification rather than a change in the psychological model: 
with r words recalled there are r—1 occasions where one word in a list follows another, and 
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since S is a count of these repetitions, it may have seemed reasonable to use r—1 rather than rin 
any attempt to allow for the advantage possessed by a longer list. The name Ratio of Repetition 
reflects this reasoning. 

4. MRR=S/max S, where max S=r-—k. This index is called the Proportion of Repetition 
index by Kagan (1966) and Robinson (1966). It was used by Bower, Lesgold & Tieman (1969) 
under the name Modified Ratio of Repetition. The psychological model is the same as for LR 
and RR. It differs from these indexes on a logical point. Since the greatest possible number of 
repetitions in a single category is r,—1 and hence the maximum total is }* (7,-1) =r-k, it could 
be argued that this is a better denominator than r—1. 

5. S/Opt S, where Opt S is the maximum number of repetitions that could be obtained from 
the given p, values and the observed r, with {7,} allowed to vary within these bounds. 

For instance, if {p,}=5, 5, 5, 5 and r=7, the maximum number of repetitions would be 
obtained from recall strings consisting of two blocked categories, such as AAAAABB or 
CCCDDDD. In this example Opt S= r-2, but it can range from r—1 to r—k, depending on the 
relative sizes of r and the p, values. 

The index S/Opt S was proposed by Gold & Cowles (1973). Unlike earlier indexes it takes 
some account of the relation between the stimulus list and the recall list, but still rests on the 
same psychological model, in which the construct is considered to affect the number of 
repetitions though the relation between it and other recall list properties is left unspecified. 

The difference between LR, RR, MRR, and S/Opt S is a mathematical one of how best to 
allow for the effect on S of a longer recall string. It is difficult to regard the distinctions between 
these four indexes seriously, since they will obviously be highly correlated when r is 
considerably greater than k. Furthermore, sinte the psychological model involved is primitive 
and undetailed, there seems little point in drawing fine logical distinctions between basically 
coarse measures. 

6. C=(S—min S)/(max S—min S), where min S is zero if r+1 is not less than twice the 
greatest r, value, and twice the greatest r, value minus (r+ 1) otherwise. 

This index was proposed by Dalrymple-Alford (1970). It is obviously related to MRR. The 
subtraction of min S is a sensible attempt to allow for the advantage possessed by recall strings 
in which words from one category make up over half the recall list so that there must be some 
repetitions however the words are ordered. This mathematical change has little practical 
importance since min S is usually zero, in which case C has the same value as MRR. The index 
is based on the same psychological model as the previous four, but the attempt to allow for 
extreme values of r, makes it even more evident that the distribution of r words among the k 
groups tn the recall list is not considered to be affected by the construct. 

7. De = S-E (S|{r}, Po). This difference score was first proposed by Dallett (1964), and was 
used by Bousfield & Bousfield (1966). The formula for the expected value is given by Mood 
(1940): 


E(Sl{r), Po) = (Er}iù-1. 


It represents a different mathematical approach to the problem of allowing for the greater 
number of repetitions that may occur with longer recall strings. In indexes 2 to 6 this allowance 
was made by dividing S by a linear function of r. In Dg, S is reduced by the number of 
Tepetitions that might be expected if the words in the recall list were ordered randomly. The 
psychological model underlying this approach is the same as for indexes 2 to 6 in that the 
construct is held to influence S, but that its relation to r and the distribution of the r words 
among the categories is left unspecified. 

However, the statistical procedures in obtaining the expected value do imply more about the 
mechanism of recall than is evident in the earlier indexes. In their consideration of how the 
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expected value of S should be derived, Bousfield & Bousfield (1966) distinguished between 
factors influencing which particular items are recalled and factors influencing the order in which 
these items are stated in the recall list. They wanted to free the expected value from the first set 
of factors. This can be done by deriving the expected value for a situation in which a set of 
items {7} is given, and may be placed in any order with equal probability. Thus {7,} is 
considered to be fixed before ordering commences, if there is no tendency to cluster recall list 
items, and is independent of the order which subsequently emerges. 

Because of the additional implication concerning the mechanism of recall, Dg and indexes 8 to 
10, which also include E(S|{r,}, Po), can be regarded as a subset of the second group of 
indexes. Indexes 2 to 6, for which the mechanism of recall is less well specified, form the other 
subset. The two subsets are kept in the one group because they are based on the same model of 
the relations between the construct and S, r, and {r}. Researchers who adopt this model must 
decide whether they are prepared to accept the mechanism implied in indexes 7 to 10, or 
whether they would rather use one of indexes 2 to 6 where the mechanism is left vague. 

8. D=[S~E (S|{r}, Po)]/(max S—min S). Dalrymple-Alford (1970) proposed this index at the 
same time as the C index. 

9, ARC = [S—E (S|{r,}, Py)]/[max S-E (S|{r,}, P))]. This index, known as the Adjusted 
Ratio of Clustering, was proposed by Gerjouy & Spitz (1966) and subsequently by Roenker, 
Thompson & Brown (1971). Currently it is probably the most widely used index. 

10. Z=[S—E (S|{r,}, Po)l/[var (S|{r}, Po)]#. This index is advocated by Frankel & Cole 
(1971). It was discussed earlier by Hudson & Dunn (1969) and Dunn (1969). The formula for 
the variance dates back to Mood (1940) and is presented in a more readily usable form by Barton 
& David (1957) and Wallis & Roberts (1957). 

Il. Dy =[S—E(S|r, P))|/[Var (Sir, Po)]?. Dunn (1969) proposed this index. Mathematically Dy 
differs from Z in that ris fixed, while r, values may vary within the bounds of the requirement 
that Zr, =r. The psychological model that is reflected by, or is responsible for, this difference is 
that the construct determines not only the number of repetitions but also the way in which the r 
words recalled are selected from the x categories; the relation between the construct and the 
total number of words is still undefined. The mathematical effect of changing from Z to Dy is to 
attribute greater scores to recall lists where there 1s unevenness in the r, values. For instance a 
recall string of AAAAAA would have a higher Dy value than AABBCC (4-34 vs. 1-88) given 
r=6,«=3, {0} =6, 6, 6. 

The definition of the construct as tendency to organize is sufficiently general to allow 
organization to be interpreted as either tendency to group many words under one or few 
categories (Dy index) or tendency to utilize several categories in the recall list and group fewer 
words under each category for a given value of r. These differences in interpretation of the 
construct cannot be resolved by appeal to mathematical logic, since the position one adopts 
depends on preference for a particular psychological model, and the choice of model is, as 
Shuell (1975) points out, a subjective matter. 

It should also be noted that, as was discussed under index 7, Ds, the statistical derivation of 
the expected value E (S|r, P,) implies something about the mechanism of recall. The expected 
value is derived for a situation in which first a fixed number of items, r, is drawn randomly from 
a finite and completely described pool, and subsequently the drawn items are placed in a 
sequence at random. This represents a situation in which the r items are fixed irrespective of {r} 
and the order of recall, when there is no tendency to organize. As well as considering the 
relations between the construct and S, r, and {7,} which form the model on which Dy is based, 


the researcher must also consider the acceptability of this representation of recall 
mechanisms. 
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Comparison of indexes 


If the researcher adopts the model underlying index 1 or that underlying index 11, there is no 
further problem of choice since each group contains only one index. But if the second model is 
chosen, a further choice must be made between the nine indexes which it underlies. This further 
choice, and even the initial choice of model, may be influenced by the use to which the index is 
to be put. Four uses were described above, each determining certain properties of the index. 
The discussion below of the suitability of indexes for uses is illustrated with information 
obtained from four sets of data which are described below. 


Data sets 


Each set consists of 100 recall lists. Three sets were obtained by Monte Carlo procedures and 
the fourth by presenting a word list to trainee teachers. 

Set 1. x= 4, {p,}= 10, 10, 10, 10. The probability of recall in each response set of each word in 
the list was arbitrarily set at 0-4. Once it was determined which words were recalled in a 
particular response their order of recall was determined by a random process, so there was no 
tendency to cluster. 

Set 2. The same conditions applied as in set 1, except that {p,} = 5, 5, 5, 5. In sets 1 and 2, the 
machine generating the recall lists has, in effect, none of the construct tendency to organize 
verbal stimuli. 

Set 3. x =4, {p,}=10, 10, 10, 10. In generating set 3, various degrees of possession of the 
construct are assumed. A model used to guide the production of recall lists was chosen so that 
the number of words recalled, though not their distribution among the categories, and the 
number of repetitions were affected by the construct. The construct was represented by a 
` parameter, 0, set randomly for each recall list at a value between 0 and 1. The probability of 
recall of any word was arbitrarily set at 0-2'-®. Once it was determined which words were 
recalled, the probability that the first word in the recall string came from category j was set at 
rir. The probability that the next word at any point in the list came from the same category, j, 
as the previous word, was set at: 


7 [ine of words already recalled from category iy 
í r—total number of words already recalled i 


The probability that the next word came from another category, i, was set at: 


e [7o oE words already recalled from category į 
n P, r—total no. of words already recalled 


These details were selected as being not too improbable a model for human processes. Note 
that this process of generating recall lists from a model is the counterpart of the procedure 
advocated by Colle (1972), in which a set of recall lists is studied in an attempt to induce the 
model, or to test the accuracy of a postulated model. One test of the practicality of Colle’s 
procedure would be the elucidation by one researcher of another’s secret model given only a set 
of recall lists. 

Set 4. x =4, {p,}=10, 10, 10, 10. A list of 40 nouns equated for frequency of occurrence in the 
Thorndike & Lorge (1944) word book was read aloud to 100 trainee teachers. The words were 
presented in a randomly determined order at 2 to 3 sec intervals. Five minutes were allowed for 
written free recall. The words in the stimulus list fell into the categories animals, food, 
transport, clothes. 
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Correlation between clustering indexes 


If all indexes correlate extremely highly over a range of data sets, there can be little practical 
difference in preferring one index before another. Shuell (1969) observed high correlations 
between several indexes, though the range of indexes compared was not as extensive as those 
considered here. 

Correlation matrices were calculated for the 11 indexes for all four data sets. (Though a 
comparison of the nine indexes in group two is our main aim, analyses were also carried out on 
the single indexes in groups one and three in order to provide additional information about the 
properties of these indexes.) 

The correlation matrix for set 4 is shown in Table |. Similar patterns of correlations were 
observed for the other three sets. The only noticeable trend is a decrease in the correlations for 
set 2, which could be expected with the much shorter recall lists in that condition. The lowest 
correlation was 0-701, between S and MRR in set 3. Correlations are high, as earlier authors 
stated, but they are not perfect. The nature of the individual research study will determine 
whether it makes any difference which index is chosen. When the decision to accept or reject 
the existence of an effect is a close one, or when there are many short recall lists, different 
indexes may lead to opposite decisions. The conservative conclusion is that the choice of index 
can affect the conclusion and therefore the choice needs to be made on defensible 
grounds. 


Table 1. Correlations between indexes, data set 4 


S LR RR MRR SfOptS C Ds D ARC Z 
LR 0:915 
RR 0-898 0-999 


MRR 0806 0964 0972 

S/OptS 0-906 0-999 0-998 0-966 

c 0-806 0964 0-972 1000 (0-966 

Dp 0-960 0-924 0-913 0-875 0-918 0875 

D 0-775 0856 0-857 0-856 0-855. 0-856 0-853 

ARC 0825 0957 0962 097 0959 0-979 0-892 0-858 

Z 0:932 0-962 0-957 0-937 0959 0937 0984 0885 0-955 

Dy 0-955 0-990 0-985 0-934 0987 0934 0-956 0858 0929 0972 


Earlier we listed four uses which may be made of clustering indexes: 

1. To decide whether a single participant shows any tendency to organize verbal stimuli. 

2. To decide whether a population shows a tendency to organize. 

3. To decide whether different populations possess different degrees of the tendency to 
organize. 

4. To explore the relation between tendency to organize and other constructs. 

This led to deduction of four properties which indexes must have in order to be suitable for 
particular uses. 

1. A normal distribution (not essential but advantageous for uses 2, 3, and 4). 

2. A known probability density function for the condition that there is no tendency to 
organize (use 1). 

3. An expected value under the condition of no tendency to organize (uses | and 2). 

. 4, A monotonic relation with the construct (unobservable but necessary for all four uses). 
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Normality 

All II indexes are approximately normally distributed for the four sets of data. Evidence for this 
is presented in Table 2 where skewness and kurtosis values are given. Deviations from the 
equivalent values of the normal curve are mild enough for the use of parametric statistics. 


Table 2. Skewness and kurtosis of distributions of indexes 


\ 








Data set 
Index 1 2 3 4 

Skewness 
S 0-39 0-75 1-24 0-92 
LR 033 0-61 0-19 0-23 
RR 0-33 0-65 0-12 0-19 
MRR 0-48 0-69 —0-20 0-06 
S/Opt S 0-36 0 64 0-14 0-23 
C 0-48 0-75 —0-20 0-06 
Da 0-35 0-58 1-41 0-92 
D 0-55 0-63 0-02 0-11 
ARC 0-47 0-61 —0-13 0-02 
Z 0-42 0-54 0-82 0-52 
Dy 0-34 0-70 0-85 0-51 

Kurtosis 
S 2:26 3-71 3-90 3-54 
LR 2-53 3-01 2-33 2-38 
RR 2-58 3-00 2-32 2-38 
MRR 2-96 2-88 2-44 2-40 
S/Opt S 2-68 3-10 2:34 2°45 
C 2-96 3:12 2-44 2-40 
Ds 2:58 3-40 4-33 3-65 
D 3-28 3-22 2-52 2-31 
ARC 3-12 3-36 2:25 2-32 
Z 2-66 2-97 3-04 2-66 
Dy 2-44 3-24 3-17 2-80 





Probability density function 


Only two indexes, Z and Dy, have a known (or rather, assumed) probability density function for 
the condition that no systematic clustering has occurred. Their form is that of a standardized 
score, and they can be taken as being distributed N(0,1). Expected values of the variance of the 
other indexes are not known, so only Z and Dy can be fit for use 1. 


Expected values 


Under the condition of no tendency to organize, six indexes have expected values that are 
constant for all r. These six may be fit for use 2. They are Dg, D, ARC, Z and Dy, for which the 
expected value is zero, and RR, for which it is a function of {p;} and x (Frender & Doubilet, 
1974). For Ds, D, ARC and Z, the expected value is for the condition that {r} is given, which is 
a statistical reflection of the psychological model to which these indexes belong: the relation 
between the construct and the number of words recalled and their distribution among the 
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categories is not specified. For Dy and RR the expected value is for the condition that r is given. 
This is appropriate for the psychological model on which Dy is based, but inappropriate for RR 
if it belongs in the second group of indexes. Perhaps it has been misclassified and should belong 
in the third group with Dy. However, given the historical circumstances of its development it 
seems more appropriate to leave it in the second group as a member of the closely related 
sub-group consisting of LR, RR, MRR, S/Opt S, and C, which are relatively unsophisticated 
attempts to compensate S for varying values of r. 

The remaining five indexes have expected values that are not independent of r or {r,}, and are 
unfit for use 2. 

All 11 indexes are fit for uses 3 and 4. 


Relation between index and list length 


The three groups of indexes differ in the relations specified between the construct and the 
number of words recalled and their distribution among the categories. These relations should be 
reflected in the values of the indexes. In all indexes except S an attempt is made to compensate 
for the greater number of repetitions that are possible, and likely, with longer recall strings. 
Therefore when there is no tendency to organize there should be no correlation between the 
index and the number of words, r. Data sets | and 2 are known to have no systematic clustering, 
so all the indexes, except S, should have effectively zero correlation with r for these sets. The 
actual correlations are shown in Table 3. 


Table 3. Correlations of indexes with number of words recalled 


Data set 

Index I 2 3 4 

S 0-499 0:359 0-934 0-760 
LR 0-149 0-019 0-802 0-482 
RR 0-110 —0-032 0:769 0-447 
MRR —0-102 —0:195 0-509 0 302 
S/Opt S 0-119 —0-005 0-781 0-467 
C —0-102 -0-179 0-503 0-302 
Ds 0-018 —0-015 0-888 0-620 
D —0-024 —0-005 0-618 0-394 
ARC -0-019 —0 018 0-547 0-376 
Z —0-011 0-019 0-851 0-536 
Dy 0-091 —0-025 0-867 0-494 


Table 3 shows that the indexes generally have the expected correlations for sets | and 2. 
Values for the more sophisticated indexes, Dg, D, ARC, Z, and Dy are closer to zero than those 
for the simpler indexes. 

Data set 3 was established from a model in which a parameter determined not only the number 
of repetitions but also r, the recall list length. Positive correlations between the index and r‘are 
therefore to be expected. Data set 4, on the other hand, was obtained from natural response 
protocols, and the appropriate model is unknown. In this set the correlations are non-zero, as in 
set 3. This indicates that the human model should be based on the assumption that the construct 
affects the number of words recalled as well as their grouping into categories in the recall 
sequence. No previous evidence for this appears to have been published. 
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Relation of indexes in group 2 to an exact probability measure 


Colle (1972) states that indexes such as those discussed here are reflections of the ‘base-line 
theory’: they measure deviations from what might be expected to occur if the construct were not 
present. The notion of unlikeliness is the key concept in the base-line theory: a large value of 
the index is unlikely if there is no tendency to organize; the assumption underlying the use of 
current indexes is that the larger and hence more unlikely the value, the greater the tendency to 
organize. The best measure of unlikeliness for group 2 indexes is the probability of a recall 
string having the observed or greater number of repetitions for a particular {7}, if the order of 
items in the recall list is random. The formula for p, the exact probability measure, may be 
derived from the theory of combinational chance (see Mood, 1940; David & Barton, 1962; Kelly, 
1973). The statistical sampling assumptions underlying p are common to the more sophisticated 
indexes in group 2 (Ds, D, ARC, Z) and are stated clearly by Bousfield & Bousfield (1966). It is 
assumed that the order of items in the recall list is a random sample of all possible orders given 
{r}. The less sophisticated measures are ratio functions relating number of repetitions observed 
to a measure of the maximum number of repetitions possible. In themselves they do not 
incorporate the notion of randomness, but authors such as Bousfield (1953, p. 234) were 
interested in departure from chance expectation and used ratios derived from ‘artificial’ data sets 
with which to compare the value of the ratio obtained. In view of the fact that all the indexes in 
group 2 are based on the idea that large clustering indexes are related to unlikely outcomes and 
hence represent a greater tendency to organize, we contend that the exact probability is a 
yardstick with which the nine indexes in this group should be compared. Indexes from groups 1 
and 3 (S and Dy respectively) are not included in this comparison as they share neither the 
psychological nor the statistical sampling assumptions of group 2. If a number of indexes which 
share the psychological assumptions of Dy existed it would be legitimate to compare these 
indexes to a probability measure, but the statistical sampling assumptions of this measure would 
be different from those underlying the p index used here. Instead of finding simply the 
probability of the number of repetitions for a given set of r, values, the probability of the 
particular r, set being obtained with a given value of r would also be included in the measure. 
p(S{r, Py) would be calculated instead of p(S|{r,}, Po). 

Another way of perceiving the usefulness of p is to consider the problem of comparing two 
values of an index. Where the values relate to recall lists with identical {r} values, the one with 
the greater value is less likely to occur by chance. However when the values of {r} differ, we 
can only assume that the greater value is less likely. That is, we do not know that the index has 
the same distribution for different values of {r,}. The Z index, for instance, is assumed to be 
N(, 1) for all {r} but we do not know how accurate this assumption is. The exact probability, p, 
on the other hand, does have the same distribution for all {7,}. If indexes are used to measure 
unlikeliness for different {r,} values, as indeed is the case, we need to know whether the index is 
identically distributed for all {r,}. The test of this is the existence of a monotonic relation 
between the index and p, for sets of recall strings in which {r,} varies: since p has the same 
distribution for all {7}, if the index is not perfectly monotonically related to p its distribution 
must vary with {7}. The degree of monotonicity in a relation can most conveniently be 
represented by a rank correlation between the quantities. Table 4 shows Spearman rank 
correlations with p for the nine indexes in group 2, i.e. those based on the same psychological 
assumptions. All four data sets are used. 
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Table 4. Magnitudes of rank correlations of indexes with p(S|{r,}, Ho) 








Data set 

Index I 2 3 4 

LR 0-961 0-897 0-982 0-979 
RR 0-966 0-892 0-971 0-976 
MRR 0-952 0-876 0-838 0-878 
S/Opt S$ 0-964 0-873 0-967 0-953 
C 0-952 0-885 0 838 0-882 
D 0-990 0-971 0-911 0-873 
Da 0-988 0-955 0-982 0-939 
ARC 0-988 0-968 0-859 0-909 
Z 0-997 0-981 0-995 0-943 
N 100 97 100 100 





Over the four data sets, including artificial data with and without assumptions of clustering 
and natural data, the Z index is most consistent in its relation to the criterion measure. The 
widely used ARC measure is less satisfactory than Z, particularly where ris small or when 
there is a tendency to organize. 


Conclusion 


In criticizing measures of clustering on the grounds that they are theoretically indefensible, Colle 
(1972) has drawn attention to an important problem: that of determining the relation between an 
unobservable psychological construct and its measure. In order for any measure to be 
acceptable, a monotonic relation between the construct and the measure must be a viable 
assumption. In the case of clustering measures, the underlying construct, tendency to organize 
verbal stimuli in free recall, is both vaguely defined and poorly integrated into theories of human 
memory. Colle has provided some useful directives which may lead to a better understanding of 
memory processes and their measurement, and which may render clustering measures redundant 
through redefinition or displacement of the construct. However, as we have attempted to show, 
it is not uncommon for measures adopted by practical researchers to be out of step with basic 
research and theorizing. The logical outcome of Colle’s recommendation is that attempts to 
measure clustering be abandoned until theoretically defensible measures are developed. 

An alternative approach, the one adopted in this paper, is to recommend that experimenters 
continue to engage in research while recognizing the primitive nature of the psychological scaling 
theory on which clustering measures are based. Within the bounds of this approach Shuell 
argues that use of a particular clustering index depends on the psychological assumptions which 
the researcher is prepared to accept. This argument is sound, and should be considered by 
researchers in choosing between the three groups of indexes which were identified above. Other 
principles are required for making a choice between indexes within a group, which at present is 
necessary only for the overpopulated second group. Of the nine indexes in that group, we 
recommend Frankel & Cole’s Z score, because it has statistical properties which fit it for the full 
range of uses of clustering indexes, and because for all the data sets investigated it was the 
closest of all the indexes in its group to a monotonic relation with an exact probability measure 
which, it is argued, ensures comparability between clustering scores from different response 
lists. Note, however, that this recommendation applied only to the second group of indexes: 
with a different psychological model either S or Dunn’s Dy would be more appropriate. 
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Alternatives to the number of repetitions as the basic measure 


Although all the indexes discussed above rely on the number of repetitions, S, as the basic 
measure of clustering, recently Hubert & Levin (1976) proposed two alternatives to S which 
researchers may wish to consider. Effectively these measures are extensions of S, in that one 
allows for continuously variable degrees of association to be specified between words in the 
stimulus list while the other makes use of the amount of separation between two words in the 


protocol lists. 


One measure, I’, extends S by allowing a weight, which may be of any value, to be assigned 
to each association between pairs of words in the stimulus list. Where these weights are 
specified as 1 for pairs of words which the experimenter sees as belonging to the same category 
and 0 for words from different categories, reverts to S. 

The other measure, Q, extends T in turn by allowing variable weights to be assigned to the 
separation distance between a pair of elements in a participant’s protocol list. Where the 
separation weights are specified as 1 for adjacent words and 0 for all others, Q reverts to I. 

Hubert & Levin provide general formulae for the expected values and variances of F and Q 
for the case where the r, values are given, but not for the case where only the total r value is 
given. Since indexes | to 10 of those described above either make no use of expected values and 
variances or do require them for the case where the r; values are given, Hubert & Levin’s 
extensions can readily be applied to these indexes by substitution of T or Q for S, though some 
difficulty may be found with MRR, S/Opt S, C, D and ARC which require maximum or minimum 
values of the basic measure. Hubert & Levin (1976, p. 1076) give a Z index based on I which is 
a direct parallel of Frankel & Cole’s (1971) Z based on S. The extensions cannot, however, be 
applied to index 11, Dy, since formulae for expected values and variances are not provided for 


the case where only the total r is fixed. 


We suggested above that researchers should choose S, Frankel & Cole’s Ze or Dunn’s Dy, 
depending on the assumptions made about the relation between the construct tendency to cluster 
and the number of words recalled and their distribution among the categories. The researcher 
who chooses S or Z should also consider whether I or © should replace S as the basic measure, 
and if so must determine what weights to assign to the associations and separation distances. At 
present these decisions about T and Q are matters of unguided choice, since there has not been 
time for anyone to develop a relevant theory or convention. 
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Remembering pictures of real and ‘unreal’ faces: Some practical and 
theoretical considerations 


Hadyn D. Ellis, Graham M. Davies and John W. Shepherd 


Three experiments are reported which explore the effects of lines on faces upon recognition memory. In Expt 
I a comparison of recognition memory for photographs of real faces with memory for Photofit faces 
indicated that memory for the former is significantly better. Experiment H examined one possible 
explanation for the first results by comparing memory for faces with memory for the same faces containing 
lines representing the feature boundaries of Photofit faces. The unlined faces were recognized more readily. 
This result was replicated in Expt III which also indicated that random lines produced the same deleterious 
effect on face memory as the systematic Photofit-type lines. 

The practical and theoretical implications of these results are discussed in relation to forensic face recall 
systems and pattern recognition. 


During the last decade there has been an increasing interest in the various factors associated 
with the perception, storage, and retrieval of faces. The recognition of faces represents, 
perhaps, the ultimate in our ability to differentiate and remember members of a homogeneous 
class of visual patterns. Faces thus constitute a very useful and sensible type of stimulus for the 
study of pattern recognition. This usefulness would be severely limited were face patterns 
analysed by mechanisms different from those involved for other classes of stimuli, but, although 
various claims have been made to this effect, the evidence for them is not very impressive 
(Meadows, 1974; Ellis, 1975). 

Practical interest in face recognition has developed along with the growth of forensic sciences. 
There is currently a great deal of interest in the reliability of face identification (e.g. Buckhout, 
1974; Goldstein, 1975; Devlin Committee Report, 1976), as well as in the effectiveness of some of 
the methods employed by the police for enabling witnesses to recall faces (Ellis, Shepherd & 
Davies, 1975). 

The purpose of the following series of experiments is to provide an example as to how 
practical and theoretical problems of face recognition are linked. 


Experiment I 

Aim 

The aim of this experiment is to compare memory for photographs of real faces with memory 
for composite faces constructed with the Photofit system.* 

Although Photofit constructions are made up from photographed pieces of real faces, they do 
not look like genuine faces. This unreality:may be due to the presence of the lines formed at the 
boundaries of the five component features, or perhaps, to the relative absence of skin texture 
and colour information normally present in pictures of faces, or indeed, to the fact that, in 
constructing the kit, only a single feature was ever taken from a particular face. ; 

People are very good at remembering photographs of real faces, usually obtaining 70-90 per 
cent correct in laboratory studies (Goldstein, 1975); but, with the above considerations in mind, 
it seems possible that their recollection of Photofit faces would be less impressive. 

There is a strong practical motive for this investigation - arising from the fact that police 
officers in many forces are required to remember Photofit constructions in the course of their 


* The Photofit system was invented by Jacques Penry and is marketed by John Waddington Ltd of 
Kirkstall. 
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normal duties. As yet, no one has examined their capacity to perform the task of remembering 
Photofit constructions. 

The following experiment constitutes a limited study in which individuals’ recognition of 
photographs of real faces will be compared with their ability to recognize Photofit faces. 


Method 


Stimulus material. Faces of 36 clean-shaven men, none of whom wore glasses, were photographed under 
studio conditions. The photographs were taken with the models looking directly into the camera. Black and 
white slides were made of the portraits. 

An experienced Photofit operator then constructed each face using Photofit. She was allowed continual 
viewing of the face during the constructions each of which took 15-30 min to complete. The constructions 
were constrained in that the operator never used the same component twice. This caused some of the 
constructions to be less than satisfactory hkenesses — but ensured that, in the recognition experiment, 
performance was not affected in any way by the same component occurring in different Photofit faces. Slides 
were made of the 36 constructions. 

The stimulus material thus consisted of 36 slides of men's faces and 36 Photofit constructions of these 
faces. 


Subjects. Sixty 17 year old police cadets attending a summer course were volunteered to take part in the 
experiment. The total comprised 44 male and 16 females. They were randomly assigned to one of four 
groups each of which contained 11 males and four females. They were tested in these groups. 


Design and Procedure The stimuli were divided into four sets: set A contained photographs of faces 1-18; 
set B contained Photofits of faces 1-18; Set C contained photographs of faces 19-36; set D contained 
Photofits of faces 19-36. The four groups of subjects received the following conditions which 
counterbalanced the order of faces and Photofits: group 1: set A followed by set D; group 2: set D followed 
by set A; group 3° set B followed by set C; group 4: set C followed by set B. 

In each case the recogmtion set comprised all 36 photographs or Photofits. Thus the group were shown 18 
photographs of 18 Photofits each for approximately 5 sec. Then about 5 min later they saw a recognition set 
of 36 photographs of Photofits depending on what they had just been shown, and were told that the original 
18 faces were mixed up among 18 new photographs or Photofits. Each recognition slide was shown for about 
5 sec and the subject was required to write on a response sheet either ‘Yes’ if he or she thought it was a 
face shown earlier, or ‘No’ if it did not seem familiar. 

Following this, the procedure was repeated using the other set of faces allocated to the particular group. 
The order of slides in the recognition sets was randomized and the same order was then used throughout. 

The design allows two measures to be extracted: (1) Identification rate - number of correct ‘Yes’ 
responses to the original 18 slides; (2) False identification rate - number of incorrect ‘Yes’ responses to the 
18 slides not previously shown. 


Results 


Table | shows the mean identification (hits) and false identification rates. From these, a 
recognition score was computed for each subject by subtracting false identification rates from 
identification rates. This measure constitutes a rough adjustment for guessing rates in that those 
who tended to adopt a lax criterion for identification would not only have made many correct 
identifications, but also have made more incorrect identifications than those who were more 
cautious in their judgement. The subtraction of false from correct identifications thus produces a 
correction for guessing. 

The mean corrected identification rates (hits minus false identifications) are higher for the 
photographic condition than for the Photofit condition (t test, t= 3-78, d.f. = 58, P< 0-01). 


Discussion 
The results indicate that there is a slight but significant advantage in memory for photographs of 


Teal faces over memory for the sorts of artificial faces produced by Photofit. As yet, we are 
unable to pinpoint the reason for this difference, but the answer may lie between the two main 
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Table 1. Means and standard deviations of correct identification and false identification rates in 
the Photofit and photograph conditions together with the recognition index obtained by 
subtracting each subject's false identification scores from his correct identification score 














Photographs Photofits 
Hits Hits 
False minus false False minus false 
Hits identifications identifications Hits identifications: identifications 
Mean 15-00 2-07 12-93 13-42 2-28 lid 
Standard 1-68 1-59 2-45 2:32 2-20 3-24 


deviation 





hypotheses mentioned in the introduction. These concerned (a) the possible distracting effects of 
the presence of lines at component boundaries in Photofit faces, and (b) the reduction in 

_ skin-tone information arising from a photographic process aimed at giving all components a 
similar skin colour and texture. 

One effect of the lines present in Photofit faces may be to induce in viewers a serial feature 
analysis which may bea less efficient way of processing physiognomic information than that 
operating under normal circumstances (Ellis, 1975). This hypothesis can be explored by drawing 
on photographs of real faces lines equivalent in thickness and placement to those found on 
| Photofit faces, and to compare recognition of these faces with recognition of the same faces 
-without such lines. This will be done in the following experiment, T 
Before moving on to consider the next experiment it is worth pointing out that, whatever the 
reason for the observed discrepancy, the implication for police work is clearly that it is 
unreasonable to expect officers to remember Photofit faces with the same facility as they show 
for remembering photographs of actual faces. 


Experiment II 

Aim 
The previous experiment revealed that recognition memory for photographs of real faces is 
better than that for Photofit faces. The following experiment is designed to test the hypothesis 
that this difference is due to the presence of lines in Photofit faces that somehow interfere with 
recognition. 

The experiment simply involves comparing one group of subjects’ ability to recognize the 

photographed faces employed in Expt I, with another group's ability to recognize the same faces 
which have had lines drawn on them comparable to those found in Photofit faces. 


Method 


Stimulus material. The 36 faces used in Expt I were also used in Expt H. 

For the experimental condition, lines were drawn to represent the boundaries of Photofit features. Figure t 
-shows one of the faces employed and indicates the placement of lines. Slides were made of faces in both 
-lined and unlined states. 


- Subjects. Forty female members of the Aberdeen Psychology Department City subject panel were randomly 
assigned to one of the two conditions. They were tested individually. 


Design and procedure. A between-subject design was used because the same faces occurred in both 
experimental conditions. 
For each condition, the subject saw 18 faces, each for 5 sec consecutively projected onto a screen. 
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B 


Figure 1. (A) an example of the faces used in all three experiments; (B) the face with lines equivalent to 
Photofit feature boundaries; (C) indicates these lines arranged in a random pattern on the face. 


Approximately 5 min later she saw 36 faces, 18 shown earlier and 18 new faces. For each one the subject 
was asked to write ‘Yes‘ or ‘No’ to indicate whether she thought it was presented earlier or not. 

Subjects initially shown the unlined faces saw only unlined faces in the recognition section, and similarly 
those presented initially with lined faces only saw lined faces in the recognition set. 


Results 
Table 2 gives a summary of the results obtained in Expt II. 

As in Expt I, the main test is a comparison of the corrected recognition scores (hits minus 
false identifications) for the lined vs. unlined faces conditions. A t test revealed that the unlined 
condition led to higher scores than the lined condition (t = 2-09, d.f. = 38, P< 0-05). t tests 
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Table 2. Means and standard deviations of hits and false identifications together with the 
corresponding means and standard deviations for the hits — false identification scores obtained in 
Expt II 


Unlined photographs Lined photographs 
Hits Hits 
False minus false False minus false 
Hits identifications identifications Hits identifications identifications 
Mean 16:20 2-10 14-10 14-95 2-75 12-20 
Standard 1-65 2:32 2-65 2-06 1-74 3-09 


deviation 


applied separately for the hits and false identification scores indicated no differences between the 
two experimental conditions for these measures (hits, t = 1-88; false identifications, t = 1-00). 


Discussion 


The results are consistent with the hypothesis that the presence of lines marking the boundaries 
of facial features impairs memory for faces. This finding does not necessarily exclude other 
factors to account for the results obtained in Expt I, but suggests that one of the major reasons 
for the discrepancy between memory for photographs of real faces and Photofits may be that 
Photofits are very obviously composites of five different features. The boundaries of the five 
features tend to break up the integrity of these faces which in turn may reduce the efficiency 
with which they are memorized. 


Experiment MI 

Aim 

So far we have demonstrated that the presence of lines demarcating features of faces can disrupt 
memory. The lines used have fragmented faces into units which are themselves integral and 
easily labelled verbally. The question to be answered by the following experiment is whether the 
impairment in face memory occurs only when lines break up the face into identifiable features, 
or whether randomly placed lines will have the same disrupting effect. 


Method 


Stimulus material. The 36 black and white portraits of men’s faces used in Expts I and II were also 
employed in Expt III. Extra prints of the faces were made and lines drawn in black ink upon them according 
to a pattern determined by a process of random allocation (see below). Slides were then made of the prints 

The same number of lines (8) were used as are found in Photofit composites and Photofit-type faces of 
Expt II. They were also matched for length of lines. The randomization procedure involved dividing the 
contour of a face into compass points and taking pairs of numbers from a table of random numbers and 
arranging each line between two appropriate pairs of compass points (obviously points which were nearly 
adjacent were not ‘appropriate’). Figure 1 shows one of the faces covered by the random lines. 


Subjects. The subjects were 81 students drawn from undergraduate classes. They were allocated randomly to 
one of three groups (27 subjects in each group). 


Design and procedure. There were three experimental conditions: (1) clear faces; (2) Photofit-type lined 
faces; (3) random lined faces. 

The same 36 faces were used in each condition. Eighteen were chosen as target faces for initial 
presentation and 18 were used as distractor faces in the recognition tests. The random orders of presentation 
for faces in the target set and target distractor set were the same for all three conditions. 
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Each group was told that they would be shown slides of faces each occurring for 5 sec and that they 
would be asked to recognize these same faces when mixed with other faces after a 5 min interval. Groups 2 
and 3 were also warned that there would be lines on the faces and that they had been placed there 
deliberately 

In the recognition test subjects were asked to decide whether each of 36 faces was one shown earlier or 
not. 


Results 


The mean results for the three groups are shown in Table 3. The mean corrected recognition 
scores for the three groups are also depicted in Fig. 2. 


Table 3. Hits, false identifications and corrected recognition scores (Hits—FIs) obtained for the 
three conditions in Expt III 


Hits 
False minus false 
Hits identifications identifications 

Photographs 

Mean 16-67 1-07 15-60 

Standard deviations 1-18 1-30 1-82 
Photofit-type photographs 

Mean 14-78 1-30 13-48 

Standard deviations 1-95 1-82 2-99 
Random-hned photographs 

Mean 14-41 1-2] 13-20 

Standard deviations 1-69 1-38 2-64 





An analysis of variance on the corrected recognition score (hits minus false identification) 
indicated a main effect for the three conditions (F=7-4, d.f. =2,78, P<0-01). Tukey tests 
(a = 0-05) revealed that this was due to the normal faces condition producing better recognition 
performance than either of the lined faces conditions. There was no difference in the recognition 
scores of the groups given Photofit-type lined faces and random-lined faces. 


General discussion 

The data from Expt III indicate that the presence of lines per se as opposed to lines bounding 
specific facial features somehow produces an impairment in memory for faces. Thus it would 
seem that the impairment in memory obtained for Photofit faces with Photofit-type lines were 
most likely caused simply by the presence of lines regardless of where they occurred on the 
face. 

The practical implication of this conclusion is that forensic devices for depicting faces should 
aim to avoid the presence of lines or obvious boundaries between components. In this regard, 
the Identikit system wherein transparent overlays of sketched features are used to build up a 
face is better than the Photofit system. That is not to say that the comparison would be 
favourable to Identikit in all respects — there has never been published a formal test of the two 
systems. 

The theoretical implications of the results presented in this paper are not easily defined. It 
would appear that the presence of lines does disrupt either the encoding or storage of faces, 
which may indicate that faces are processed in a holistic manner and that lines interfere with this 
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Figure 2. Results of Expt III showing the corrected recognition scores for remembering pictures of faces, 
faces with Photofit-type lines, and faces with random lines. 


process. It is possible that other patterned visual inputs would be similarly interfered with — but 
we have no evidence on this as yet. 

Luria (1973, p. 117) presents evidence that patients with damage to the secondary occipital 
zones may display visual agnosia only when a simple picture is criss-crossed with lines. The 
present experiments indicate that normal people show some impairment when a complex figure is 
crossed by irrelevant lines. As mentioned earlier, however, it is not clear from these results 
whether the lines interfere at the reception of the input or during memory storage. 

In reviewing the literature on face recognition, Ellis (1975) discusses various strategies people 
may use when viewing a face. For example, he cites the work of Noton & Stark (1971) who 
showed that people tend to make the same eye movements when viewing a particular scene 
again and again. If such a procedure has evolved for the processing of faces, then, conceivably, 
the presence of lines could disrupt the normal mode of picking up physiognomic detail, which in 
turn may impair recognition memory because of inefficient initial encoding. 

The present experimental paradigm, in which the nature of the presentation and recognition 
faces remains the same, may easily be modified to investigate what happens when a face seen 
under one set of conditions is later presented under another set. The real-life parallel to such a 
study is the case where someone adopts a disguise in order to evade identification. Sometimes 
the disguise involves dramatic changes in hair colour and style and the addition or removal of 
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facial hair; but on other occasions the simple addition or removal of spectacles can be effective. 
It may be that spectacles disrupt the usual processing of face information. 

There has been no published research on disguise effectiveness, but one way of examining the 
problem may be to investigate the effects of placing lines on faces which have been initially seen 
without lines (addition disguise), and comparing this with the effects on recognition of removing 
the lines from the face shown initially with lines. 

These and related questions concerning disguise are currently under investigation. 
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Kinaesthetic after-effect, organismic state and retest reliability 


Irene Wolk Kostin, A. Harvey Baker, Brian L. Mishara and Laurence Parker 


Earlier research demonstrated that exposure to the first administration of the kinaesthetic after-effect (KAE) 
task differentially biases second (and later) session KAE scores. This bias results in low retest reliability. 
Here we explore the hypothesis that variation in organismic state also affects KAE retest reliability. A 
significant retest correlation was found for subjects classified as consistent in organismic state (r= 0-349), but 
not for those subjects classified as inconsistent in organismic state (r = —0-076). If, as these findings indicate, 
KAE scores reflect state variance, can they also constitute an adequate measure of trait? None of the 
relevant prior one-session KAE validity studies controlled for state variance, yet most had significant 
findings. Although KAE scores are evidently influenced by state variables, this does not contraindicate 
continued use of KAE as a measure of trait. 


The major direction for kinaesthetic after-effect (KAE) research in recent years has been the 
study of individual differences. Petrie has been one of the leaders (e.g. Petrie, Collins & 
Solomon, 1958; Petrie, Holland & Wolk, 1963; Petrie, 1967). In this area, she has postulated that 
KAE taps individual differences in a mechanism which modulates the intensity of incoming 
stimulation: individuals at one extreme (called ‘Reducers ’) maximally damp down or reduce the 
subjective intensity of incoming stimulation; individuals at the other extreme (called 
‘Augmenters") maximally augment or magnify stimulus intensity. The present study tests the 
hypothesis, arising from this line of research, that the test-retest reliability of KAE varies with 
variation in organismic state. This hypothesis is relevant to (a) a recent controversy surrounding 
the issue of KAE’s retest reliability and (b) findings regarding the effect of organismic state on 
KAE. Before turning to these issues, we will briefly describe KAE and then summarize the 
evidence which supports the hypothesis that KAE measures stimulus intensity modulation. 

Kinaesthetic after-effect refers to changes in judgements of the width of a block which occurs 
after rubbing a block of a different width. Typically, the width of a standard test block (T) is 
judged both before (pretest) and after (test) the rubbing of an after-effect induction block (1) 
which is either larger than (I > T) or smaller than (I < T) the standard. The KAE score is the 
difference between test and pretest. Petrie’s (1967) procedure has a subject rub I with one hand 
while he/she rests the other. In another KAE variant, the subject simultaneously rubs two I 
blocks — a large one with one hand, a smaller with the other (e.g. Cavonius, Hilz & Chapman, 
1974). This paper is limited to issues concerning the one-hand procedure and does not consider 
the two-hand variant. 

Evidence that KAE validly assesses stimulus intensity modulation includes such findings as: 
KAE relates to pain tolerance (Petrie et al. 1958 —- Expt 1; Poser, 1960; Sweeney, 1966; Ryan & 
Foster, 1967); to tolerance for sensory deprivation (Petrie et al. 1958 — Expt 2; Sales, 1971); to 
stimulation seeking (Sales, 1972); to delinquency (Baker, Mishara, Kostin & Parker, 1974); to 
tendencies to plan ahead and to attitudes towards death (Mishara, Baker & Kostin, 1972); and to 
lifestyle in the elderly with special reference to degree of social engagement (Mishara & Baker, 
1974, 1978 c). 


KAE reliability controversy 


Petrie (1967) assessed alternate form retest reliability in three samples by administering 
large-block KAE (I >T) in session 1 and small-block KAE (I <T) in session 2. Although she 
reported moderate (0-40) to high (0-78) correlations, other researchers failed to replicate this 
finding (i.e. r=0-27, Broadhurst & Millard, 1969; r= —0-34, significant in the wrong direction, 
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Morgan, Lezard, Prytulak & Hilgard, 1970; and r=0-19, Salter, 1972). Moreover, Becker (1960), 
Morgan & Hilgard (1972), Platt, Holtzman & Larson (1971) and Weintraub, Green & Herzog 
(1973)* failed to find significant retest reliability when the identical form of KAE was 
administered upon two occasions. Spitz & Lipman (1960) did find significant identical form 
reliability (r = 0-74), but their procedure differed from all other studies because they used special 
training procedures to make pretest scores comparable on the two occasions. 

Based primarily on these findings of very low KAE retest reliability, a consensus has emerged 
that KAE should be dropped as a measure of an enduring personality trait (Platt et al. 1971; 
Morgan, 1972; Morgan & Hilgard, 1972; Weintraub & Herzog, 1973; McDonald, 1974). Petrie 
(1974) attempted to rebut this conclusion by dismissing studies which reported inadequate 
reliability because they omitted certain necessary precautions (detailed in Petrie, 1967, pp. 
106-127; e.g. 45 min finger rest before KAE testing; at least two days between testing sessions, 


. etc.). 


A new view is emerging (Baker, Mishara, Kostin & Parker, 1974, 1976; Mishara & Baker, 
1978 a, b; Baker, Mishara, Parker & Kostin, in press) which differs sharply from both Petrie 
(1974) and the critics. This view states: (a) only first session KAE scores validly assess 
responsivity to incoming stimulus intensity; and, (b) poor KAE retest reliability reflects 
systematic individual differences in practice effects, which arise from exposure to first session 
KAE, and which are carried over to and then bias second (and later) administrations. A general 
practice effect (change in group mean) — long known to occur in KAE (e.g. Bakan & Thompson, 
1962; Hilgard, Morgan & Prytulak, 1968) —- would, of course, not change the test-retest 
correlation for this would, in effect, add a constant to one set of scores entered into a 
correlation. However, systematic individual differences in bias would influence the retest 
reliability correlation (McNemar, 1969, pp. 166-168) since such bias would not shift scores by a 
constant amount. Direct evidence of systematic individual differences in (i.e. differential) bias on 
KAE was first reported by Bakan & Thompson (1962) and was later confirmed by Baker et al. 
(1976) and Baker et al. (in press). The two latter studies, besides demonstrating differential bias 
for KAE, also show that the locus of this bias is in second (and later) session pretest trials. 
More specifically, there are changes from session 1 pretest trials to session 2 pretest trials that 
are systematically related to individual differences in the size and direction of the initial 
after-effect score (holding initial pretest level constant). Such differential bias, we have 
contended, underlies the poor retest reliability that has been reported for KAE. 

This new view contrasts with Petrie’s (1974). Whereas she simply dismisses findings of 
inadequate retest reliability as reflecting poor methodology, we accept these findings. We argue 
that these findings indicate a differential bias which is systematically associated with KAE and 
would even be found if Petrie’s (1967) exact procedures were used. 

Our position also differs from that of the critics who reject KAE because of its poor retest 
reliability. The critics have ignored the admonition of such statisticians as McNemar (1969) who 
caution against the use of test-retest reliability precisely because the second administration of a 
test can be differentially affected by practice or memory effects. We have argued that inadequate 
KAE retest reliability simply reflects systematic individual differences in bias in second- (and 
later-) session KAE scores (e.g. Baker et al. 1976; Mishara & Baker, 1978a, b). Session 1 KAE 
is unbiased and there is strong evidence supporting its validity: this is seen in our validity review 
above, each study cited being a single session study (cf. also Baker et al. 1976, Table 1). There 
is thus no basis for rejecting the use of session 1 KAE as an index of personality functioning. 


* In actuality, this study involved seven different sessions. To obtain data comparable to all the other 
studies cited in the text, we have ourselves computed the retest (identical form) reliability coefficients 
between session 1 and session 2 for the I >T (n= 35) and I < T (n= 34) groups. The correlanons were 0-02 
and 0-17 respectively. We thank Daniel Weintraub for making these data available to us. 
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The effect of organismic state on KAE 


Although we believe that differential bias contributes significantly to KAE’s poor retest 
reliability, it is possible that KAE’s reliability is sensitive to more than one factor. Perhaps 
failure to control for organismic state also has lowered the retest correlations in prior studies. 
While numerous studies, reviewed above, show that KAE measures a personality trait, there are 
also studies which show that KAE is sensitive to variation in organismic state. Petrie (1967) 
reported that such state altering variables as alcohol, aspirin, and audioanalgesia (loud white 
noise) significantly affect KAE performance. Poser (1960) reported that amobarbital sodium 
affects KAE. 

Petrie’s (1967) and Poser’s (1960) findings are based on a two-session KAE procedure. The 
bias effect discussed above makes any two-session KAE study methodologically suspect. 
However, one-session studies also demonstrate that state altering variables such as menstrual 
cycle (Baker, Mishara, Parker & Kostin, 1974) and dexedrine and phenobarbitol (Gupta, 1974) 
significantly affect KAE performance. 

Based on this evidence, we hypothesized that identical form KAE retest reliability is higher for 
those subjects whose organismic state is identical (or nearly so) on two testing occasions 
(consistent state group) than for those whose organismic state is different on two occasions 
(inconsistent state group). Beyond this central hypothesis, we tested for differential bias in both 
the consistent state and inconsistent state groups. We also performed analyses which assessed 
the plausibility of an alternate interpretation (see Results) of the findings reported here. 


Method 
Apparatus 


The apparatus consisted of a 2 in (5-08 cm) wide test block, a 2:5 in (6-35 cm) wide inducing block, and a 30 
in (76-20 cm) long tapered comparison wedge. 


Procedure 


Since Petrie (1974) has challenged earlier studies for not following her exact procedures, it seems important 
to specify the operations that we used to achieve as exact a replication of her procedures as possible. Her 
oral instructions to the subjects were used verbatim (Petrie, 1967, pp. 117-119). An associate of Petrie’s, T. 
Holland, who was trained by Petrie and who collaborated with her in earlier KAE research, demonstrated 
Petrie's KAE procedure in detail to two of us (AHB & BLM)..The experimenter was trained until she was 
quite proficient in following the procedure. Then, the third author (IWK), who was herself trained by and 
formerly worked as a colleague with Petrie, reviewed the experimenter’s performance. 

Each KAE session consisted of 18 judgements. On each trial, the blindfolded subject (who never saw any 
of the apparatus) held the test block with the thumb and forefinger of the right hand and indicated its width 
on the tapered block with the thumb and forefinger of his/her left hand. First, two practice judgements and 
then four pretest judgements (pretest trial block) of the width of the test block were made. Then each 
subject underwent a 90 sec after-effect induction period in which he/she rubbed a 2:5 in wide inspection 
block with the thumb and forefinger of the right hand while resting the left hand. Four test judgements (test 
trial block 1) then were made. Two more periods of after-effect induction (90 sec and 120 sec) ensued. Each 
period was followed by a set of four test judgements (test trial blocks 2 and 3, respectively) which resulted 
in a total of 12 test trials in all. The mean of the 12 test judgements minus the mean of the four pretest 
judgements constituted the KAE score. This task was administered twice, with an intervening period of two 
or three days. The experimenter was blind to the hypothesis. 

Prior to the KAE procedure, subjects rested their hands for 45 min 3o that nothing touched their thumbs 
and forefingers. During this time, they were asked questions about the five different types of state variables 
which Petrie (1967) postulated would affect KAE performance. The specific questions used concerned (a) 
degree of tiredness now as compared to usual degree of tiredness (rated on a seven-point scale); (b) presence 
vs. absence of any feeling of sickness or illness; (c) presence vs. absence of the experience of pain; (d) 
presence vs. absence of skin rash on hands; (e) ingestion vs. non-ingestion of any kind of medication, drug 
or alcoholic beverage in the past 12 br. (Some researchers have suggested that variation in menstrual cycle — 
another type of state variable — affects KAE scores (Petrie, 1967) and/or the modulator mechanism 
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(Silverman, Buchsbaum & Henkin, 1969). Although we had data regarding menstrual cycle in this study, it 
was never used as a criterion for consistent vs. inconsistent state classification because of the nature of the 
design. Since each subject was tested at intervals of no more than three days, a subject at a given place in 
the menstrual cycle in session 1 would be at most three days further along in the cycle in session 2. 

This is not a large difference and our own findings regarding the relationships between menstrual cycle and 
KAE indicate that a difference of three days would have very little effect on KAE scores (Baker, Mishara, 
Parker & Kostin, 1974). Thus, the design makes all females ‘consistent’ in menstrual cycle across the two 
sessions.) 


Classification of subjects 
Subjects were 65 (31 male and 34 female) right-handed college students from Wayne State University, 
Detroit, Michigan. 

Without knowledge of the KAE scores, subjects were assigned to the consistent state group (n = 32) if (a) 
their tiredness ratings were no more than two points apart on the seven-point scale and (b) their answers to 
the questions regarding illness, pain, hand rash and drugs were identical on the two testing occasions. 
Subjects not meeting these criteria were assigned to the inconsistent state group (n = 33). 


Results 
Test of the main hypothesis 


The test-retest correlation for the consistent state group was 0-349 (d.f. = 30, P< 0-03, 
one-tailed) while the test-retest correlation for the inconsistent state group was —0-076 

(d.f. = 31, n.s.). (One-tailed tests were employed in this paper since specific directional 
hypotheses were involved.) The results of a test of the difference between these two correlations 
yielded a t value of 1-654 (d.f. =62, P=0-052), which closely approaches the conventional level 
of significance. Considered as a set, these findings indicate that self-reported variation in 
consistency of organismic state relates to variation in retest reliability. 


Some secondary findings 


Does the size of KAE (group mean) differ for subjects consistent in organismic state compared 
to subjects inconsistent in organismic state? For session 1, the mean KAE for the two groups 
was almost identical: consistent state group, M = —2-49 mm, s.D. = 5-37 mm; inconsistent state 
group, M = -2:46 mm, S.D. = 5-19 mm. For session 2, the mean KAEs for the two groups 
differed slightly but not significantly: consistent state group, M = —3-90 mm, s.D. =3-42 mm; 
inconsistent state group, M = —2-91 mm, s.pD. = 3-57 mm (t= 1-13, d.f. = 63, n.s.). 

A second question is, was the differential bias effect, described earlier, present in either or 
both of the two state groups? Baker et al. (in press), and Mishara & Baker (1978 a) have 
established that an appropriate formula for assessing differential bias or carry-over effects from 
session | to the pretest of session 2 is the following partial correlation, 


Test 1) (Pretest 2) + Pretest 1 (1) 


For the consistent state group, this partial r was 0-49 (d.f. = 29, P< 0-005). For the inconsistent 
state group, it was 0-70 (d.f. = 30, P< 0-001). These results show that substantial differential bias 
effects occur for both consistent and inconsistent state groups. 

A third question is, do the consistent and inconsistent state groups differ in degree of bias? 
No. A test of the difference between the two above partial correlations is not significant 
(t= 1-24, d.f. =62, P> 0-10). 


Test of an alternative hypothesis 

In terms of the distribution of scores, the higher correlation for the consistent state subjects 
indicates that individuals in this group had more similar KAE scores on the two testing 
occasions. Our interpretation is that the greater similarity on KAE reflects greater similarity in 
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organismic state. However, there is a possible alternative interpretation which holds that this 
difference in retest correlation between the consistent and inconsistent state groups reflects 
differences in stylistic tendencies to be consistent or inconsistent in perceptual reports, rather 
than differences in state. (We extend thanks to the reviewer of this article for pointing out this 
alternative formulation.) Proponents of this view would argue that subjects who have a strong 
tendency to be consistent in perceptual reports would tend to be consistent across sessions both 
(a) in their perceptual reports about aspects of organismic state, such as tiredness, illness, or 
pain (based on information from their viscera), and (b) in their percepts about width (based on 
information from their kinaesthetic senses). Further, subjects inconsistent in their perceptual 
reports would be inconsistent across sessions both (a) in their self-reports regarding organismic 
state and (b) in their width judgements. If the ‘consistency of perceptual report’ interpretation 
were valid, the consistent self-reports about state would only be one aspect of a general pattern 
of consistent responding. Likewise, inconsistent self-reports about state would reflect a general 
pattern of inconsistent responding in making perceptual judgements. 

One way of testing which hypothesis is more tenable is to see if the consistent state group 
showed greater consistency of perceptual reports in a situation in which organismic state would 
not be a factor. We compared the two groups on their consistency in KAE scores within each 
session, since within a session, there is presumably little variation in organismic state. Our index 
of consistency was a measure of within-session KAE variability based upon the standard 
deviation of the following three KAE scores: mean of test trial block 1 minus mean of pretest 
trial block; mean of test trial block 2 minus mean of pretest trial block; mean of test trial block 3 
minus mean of pretest trial block. For each subject, the standard deviation of the above three 
within-session KAE scores was taken as an index of within-session KAE variability and was 
computed separately for sessions 1 and 2. For session 1, the consistent state group (M = 2-40 
mm, S.D. = 1-53 mm) did not differ significantly (t= 0-15, n.s.) from the inconsistent state group 
(M = 2-46 mm, s.D. = 1-74 mm) on this index of variability. For session 2, the consistent state 
group (M = 1-95 mm, S.D. = 1-08 mm) again did not differ significantly (t = 0-22, n.s.) from the 
inconsistent state group (M = 1-89 mm, s.D. = 1:20 mm). i 

As a double check on the analyses just reported, we also explored this same issue using 
another measure of variability of perceptual report, namely, the consistency with which 
subjects made judgements within the four trial blocks (each composed of four trials) for each 
session. To assess variability, the standard deviations for pretest trial block, test trial block 1, 
test trial block 2 and test trial block 3 were computed and the mean of these four standard 
deviations was obtained for each subject for each session separately. The consistent state group 
did not differ significantly from the inconsistent state group on this measure (session 1, t= 0-71, 
n.s.; session 2, t= 0-00, n.s.). 

This issue was also examined by comparing responses on another task: Some subjects from 
each state group participated in a third session during which they took a time judgement task 
which used two methods. First, subjects made four estimates of a 2 min time interval (method of 
production). Second, the experimenter demonstrated a 20 sec time interval and then subjects 
made four estimates of this time interval (method of reproduction). For each subject, two 
standard deviations were computed — one for the 2 min estimates and one for the 20 sec 
estimates. The mean of the standard deviations of the 2 min estimates for the consistent state 
group (M = 16-20 sec, $.D. = 9-35 sec) did differ significantly (t = 2-36, d.f. =29, P<0-05 
two-tailed) from the mean for the inconsistent state group (M = 9-77 sec, S.D. = 4-42 sec) but this 
difference is opposite to the direction predicted by the ‘stylistic consistency of perceptual report’ 
hypothesis — the consistent state group showed greater variability. The mean of the standard 
deviations of the 20 sec estimates for the consistent state group (M = 1-16 sec, $.p. = 0-85 sec) 
did not differ significantly (t = 0-55, n.s.) from that for the inconsistent state group (M = 1-23 sec, 
S.D. = 0-61 sec). 
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None of these results support the hypothesis that the consistent and inconsistent state groups 
differ in a stylistic tendency to be consistent/inconsistent in perceptual report. 


Discussion 

The main finding from this study is that the consistent state group shows significant KAE 
test-retest reliability whereas the inconsistent state group does not. Although a self-report 
measure of organismic state was used, ancillary analyses show that the measure does not 
apparently tap a general style of making perceptual judgements. Failure to find support for this 
alternative hypothesis strengthens our view that (a) the consistent state group and the 
inconsistent state group do differ in consistency of true organismic state and thus that (b) the 
observed findings regarding test-retest reliability do reflect variation in organismic state. 

In further ancillary analyses, evidence of systematic individual differences in bias in the 
second session pretest was also observed, regardless of state group. Furthermore, the two 
groups did not differ with respect to the magnitude of this bias. We had no a priori expectation 
here — we simply felt it important to determine whether or not the differential bias, known to 
occur in the second session of KAE (Bakan & Thompson, 1962; Baker et al. in press) was 
equally present in both groups. It is. i 

The effect of state on test-retest reliability in KAE has not been previously assessed, but the 
results reported here are convergent with the KAE literature. Those who postulated that KAE 
scores reflect the status of a hypothetical mechanism which modulates intensity of incoming 
stimulation have long maintained that this mechanism mediates both enduring differences 
between people (trait variance) as well as short-term changes within a person (state variance) 
(Petrie et al. 1963; Petrie, 1967). It is consistent also with previous empirical findings that state 
variables such as dexedrine and phenobarbito] (Gupta, 1974) and menstrual cycle (Baker et al. 
1974) affect KAE. 

The overall retest reliability value for this study is also consistent with the literature: 
Combining consistent and inconsistent state groups, the retest r for the entire sample was 0-14 
(d.f. = 63, n.s.). This matches quite closely the median correlation of 0-17 in five samples from 
four previous retest reliability studies of identical form KAE (Becker, 1960, r= 0-04; Platt, 
Holtzman & Larson, 1971, r=0-279; Morgan & Hilgard, 1972, r=0-18; Weintraub, Green & 
Herzog, 1973, r=0-17 and r= 0-02). Moreover, since the previous test-retest studies did not 
distinguish between consistent and inconsistent state group status, one might expect — if state 
really does affect retest reliability — to find that these retest correlations should tend to fall 
between the values obtained for our consistent and inconsistent state groups. In fact, the values 
for the consistent and inconsistent state groups, 0-349 and —0-076, respectively, neatly bracket 
the range of values, 0-279 to 0-02, reported in the previous studies. 

We do not conclude that variation in organismic state by itself explains the low KAE 
test-retest reliabilities reported in the literature. Indeed, the finding that the retest correlation 
for the consistent state group is significant is much less important than the fact that the 
magnitude of this correlation is modest, falling far below what is usually considered adequate for 
personality measures (0-70). Contrary to Petrie’s claims (1974), inadequate test-retest reliability 
was here observed even with her exact procedures. Even with consistent state and an exact 
replication of Petrie’s procedures, inadequate retest reliability was found in KAE. 

These findings, coupled with the presence of significant differential bias in both state groups, 
lead us to conclude that low KAE reliability is due both to differences in organismic state and 
to the presence of systematic differential carry-over effects in the second session (Baker et al. 
1976, in press; Mishara & Baker, 1978a, b). As we have noted previously, in contrast with the 
recommendation of the critics that KAE should be dropped as a personality measure, the 
presence of such carry-over effects implies only that the KAE task should be restricted to a 
one-session procedure. The true test of the measure is whether KAE shows consistent validity 
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findings for hypothesized criterion variables. As we have summarized elsewhere (see 
introduction; see also Baker et al. 1976, Table 1), the answer to this question of whether KAE 
shows such consistent validity findings is a clear-cut ‘yes’ for session 1 KAE. Therefore, session 
1 KAE should be retained as a measure, despite poor test-retest reliability findings. 

In the light of session 1 KAE’s clear-cut validity, the finding that state influences KAE should 
not inbibit its use as a personality measure. The effect of state variance on KAE can be 
deme astrated without controlling for trait, as Baker et al. (1974) and our present study indicate. 
Similarly, none of the much larger number of single session KAE-trait findings controlled for 
state, demonstrating that it is not necessary to control for state in order to obtain significant 
personality findings. This suggests that KAE is in fact a robust measure of personality, since it 
withstands demonstrated state variance. Of course, some means of separating trait and state 
variance would be useful, and this should be a question for future research. 

Finally, we would like to underscore two general methodological issues that arise in this study. 
The first is the question of whether a self-report measure of organismic state can be used to 
measure consistency of that state over time, or whether it simply reflects a generally consistent 
or inconsistent perceptual style of self-report. Since we were able to obtain measures of 
perceptual consistency for our subjects for short-duration tasks during which organismic state 
remains relatively constant, we were able to test this question. Groups which reported consistent 
organismic state over two sessions did not differ significantly on these short-duration measures 
of perceptual consistency from groups which reported inconsistent organismic state over two 
sessions. We thus conclude that consistent/inconsistent report of organismic state does not tap 
some general perceptual style of self-report. Thayer (1970), working with a self-report measure 
of activation level (defined somewhat similarly to our tiredness item), found significant 
relationships to a number of physiological indices of activation. Because the correlations 
between the self-report index and the physiological variables were generally higher than the 
intercorrelations among the physiological measures themselves, Thayer concluded that 
‘.. self-report may be an integrative variable more representative of general states of bodily 
activation than any single psychophysiological variable’ (Thayer, 1970, p. 93). Our work thus 
supports Thayer’s (1967, 1970) contention that self-reports are a legitimate measure of 
organismic state. 

Second, we have previously contended (Baker et al. 1976) that a general misunderstanding 
exists regarding the weight of test-retest reliability evidence in the process of construct 
validation. Statisticians have frequently warned (e.g. McNemar, 1969) that observed test-retest 
reliability can be a poor indicator of true test reliability, because it can be easily distorted by 
practice effects. Despite these admonitions, researchers in the area of KAE paid little attention 
to this possibility, even though differential bias effects had already been observed for KAE 
(Bakan & Thompson, 1962). Variations in organismic state also can affect retest reliability, and 
do for KAE as we have here demonstrated, but this possibility has not been considered. To 
sensitize researchers to an important, but apparently neglected, issue, we would once again like 
to underscore that low test-retest reliability may have many sources, some of which do not 
invalidate a variable for personality research. 
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The relationship between sensation-seeking and Eysenck’s dimensions of 
personality 


Sybil Eysenck and Marvin Zuckerman 





Theory and common correlates suggest a relationship between dimensions of personality measured by the 
Eysenck Personality Questionnaire (EPQ) and the Sensation-seeking Scale. In order to examine these 
relationships, the two tests were given to 219 American and 879 English subjects of both sexes. 
Sensation-seeking was positively correlated with the traits of extraversion and psychoticism as measured by 
the EPQ. There was no relationship between sensation-seeking and the trait dimension of neuroticism. 


The hierarchical theory of personality structure which is almost universally adopted by factor 
analysts (Eysenck, 1970), provides at the higher order end three major type factors (Royce, 
1973) which have been variously named, but which may be designated E 
(extraversion-introversion), N (neuroticism-stability) and P (psychoticism-superego). These 
superfactors have been sufficiently often replicated to suggest stable and permanent principles of 
personality structure (Eysenck & Eysenck, 1969, 1976). At a lower level we have trait concepts 
such as sensation-seeking, and the problem arises as to the relationship between these traits and 
the superfactors; it cannot be assumed that each trait will be subsumed under just one 
superfactor. A further problem arises when we consider that each trait itself may be made up of 
several different subtraits, and that these too will have relations with possibly more than one 
trait and superfactor. 

Since both extraversion and sensation-seeking have been theoretically related to the construct 
of an ‘optimal level of stimulation’ (Eysenck, 1967; Zuckerman, 1969, 1974) the relationship 
between the test measures of these traits has been of considerable interest. Both traits have been 
correlated with sensation-seeking in sexual experience (Zuckerman et al. 1972; Eysenck, 1976; 
Zuckerman, Tushup & Finner, 1976), and have been found to be high in drug abusers 
(Kilpatrick, Sutker & Smith, 1976), and psychopathic and delinquent groups in general 
(Blackburn, 1978; Emmons & Webb, 1974; Eysenck & Eysenck, 1975; Zuckerman, 1974, 1978 a). 

A number of previous studies have reported correlations between the Eysenck Personality 
Inventory (EPI), which does not contain a scale for P, and the General scale of the SSS (Farley, 
1967; Farley & Farley, 1970; Zuckerman & Link, 1968; Bone & Montgomery, 1970; Zuckerman 
et al. 1972). These studies have generally found low but significant correlations between E and 
SSS Gen ranging from 0-12 to 0-58, the typicai correlation falling between 0-2 and 0-3. 
Correlations between N and SSS have typically been low and insignificant. 

Until this study, no one has reported correlations with the new Eysenck & Eysenck (1975) 
Personality Questionnaire (EPQ), which includes the P scale, although there have been a few 
unpublished studies (described in Zuckerman, 1974) relating the SSS to the PEN, an earlier 
version of the EPQ. 

The nature of the P dimension is still being debated (Bishop, 1977). While Eysenck & Eysenck 
do not identify it with clinical psychosis, they do feel that it measures a trait which underlies 
psychosis. The facts are that groups such as prisoners, drug addicts, alcoholics, personality 
disorders, and even some groups of normals, such as art students, score as high or higher than 
clinical psychotics: The items in the scale are diverse in content, and the one thing they do have 
in common is non-conforming, atypical attitudes indicating a lack of socialization or a weak 
‘superego’. Since sensation-seeking is also related to a similar dimension, as evidenced in 
-~ correlations with trait tests and reported experience, we would expect a positive correlation 
between SSS and P. 
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A fourth scale on the EPQ is the Lie Scale (L). The L scale is a measure of defensiveness or 
social desirability response set. Farley (1967) investigated the relationship between the SSS and 
social desirability using the EPI L scale along with two others, and concluded that ‘no 
appreciable amount of the item variance in the SSS was accountable in terms of the SD (social 
desirability) scales employed’ (p. 95). We would not, therefore, expect very high correlations 
between the L scale and SSS scales. 


Methods and procedure 
Scales 


SSS. The SSS form V (Zuckerman, Eysenck & Eysenck, 1978) was developed by comparing the results of 
factor analyses in American and English samples to select the items loading on the same four factors in 
males and females in both samples. The results showed a high degree of congruence for the four factors. 
The new form includes ten item subscales for each of the factors and a total score based on the sum of the 
four scales The subscales may be described as follows: 

1. Thrill and Adventure-seeking (TAS) contains items indicating a desire to engage in outdoor activities 
involving elements of speed and danger Most of the activities are socially acceptable. 

2. Expenence-seeking (ES) items reflect the search for new experiences through the mind and senses in a 
non-conforming ‘life-style’. 

3. Disinhibition (Dis) items reflect the desire for excitement and variety through parties, drinking and 
sexual variety. 

4. Boredom Susceptibility (BS) items indicate a distaste for repetitive experience, work, or dull, boring 
people, and restlessness when exposed to an unchanging environment. 


EPQ. The EPQ (Eysenck & Eysenck, 1975} has been evolved from earlier scales, the MPI and EPI, and 
represents the culmination of Eysenck’s attempt to develop a questionnaire measure of three basic 
dimensions of personality. In assigning items to the three scales, orthogonality of the dimensions has been a 
major criterion. Thus items which loaded equally on two or three factors tended to be eliminated in favour 
of ‘purer’ items. The EPQ contains four scales: 

1. Introversion-Extraversion (E) is conceived of as a mixture of traits of sociability and impulsivity. 

2. Neuroticism (N) reflects the dimension of emotional stability-instability with persons at the high end 
being prone to respond with dysphoric emotions to a variety of situations. 

3. Psychoticism (P) is also called ‘tough-mindedness’ reflecting unusual ways of thinking, asocial attitudes, 
cruelty, indifference to the feelings of others, and some paranoia. 


Subjects 


Correlations between the SSS and EPQ were done in four samples: male and female American 
undergraduates and male and female English twins. The American sample consisted of 97 male and 122 
female undergraduates from two large undergraduate classes in Introductory Psychology They were 
primarily freshmen and sophomores in the age range of 17 to 20 

The English sample consisted of 254 male and 625 female twins, from the Maudsley twin register 
described in Eaves & Eysenck (1975). Twins, treated as individual subjects, were used because this study 
was part of a genetic study which will be published later. The English sample was much more heterogeneous 
than the American one, with an age range of 16 to 70 and of diverse educational and socio-economic 
backgrounds. The twins tend to resemble the general population of England on various personality 
dimensions (Eysenck, 1976). 


Procedure 


The American sample was given the SSS and EPQ at one time in a group classroom setting. The English 
sample had taken the EPQ from one to three years previous to the time they were given the SSS. The SSS 
was mailed to these subjects and taken at home using the same procedure as had been used with the EPQ. 
The return rate of questionnaires was approximately 80 per cent. 


Results 


Table 1 shows the correlations between the scales of the SSS and EPQ in both samples. The 
correlations tend to be higher in the American sample probably because the two tests were given 
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Table 1. Correlations between SSS and EPQ Sensation-seeking Scales 





EPQ Males Females Males Females Males Females Males Females Males Females 





American‘ 
P 0-23 0-08 0-46** 0-21 0-30* 0-39%* 0-30** 0-48** 0-50** 0 40** 
E 0-12 0-29** —0-02 0-24* 0-35** 0-48%* 0-14 0 18 0-25 0-444™ 
N —0-04 -0-08 007 -003 0-16 0-32%* -0-00 0-05 0 08 0-11 
L -0-16 -0:24% -0-14 -0-25* -022 -0-47** -005 -030* -023 —0-45** 
English? 
P 0-05 0-15** 0 24** 0-28** 0-26**  0-29** 0-29** 0.30** 0-30** 0-34*™ 
E 0-23**  0-17** 0-10 0-10* 0-33** 0-24** 0-15 0 15** 0-32%*  0-23** 
N —0-16* -0-06 —0-04 0-00 0-13 0 16** 0-09 0-04 0 00 0-04 
L —0:18* —0-21** —0-17* -0-32** -0-13 -0-31** -0-09 —0-18** —-029** —0.35** 


* males, n= 97; females, n= 122. 
è males, n= 254; females, n = 625. 
* P<0-01; **P< 0-001. 


on the same occasion; the long interval between the tests in the English sample might have given 
a chance to people to change somewhat in the meantime. 

Since such large numbers of subjects were used, the 0-01 level of significance was used rather 
than the conventional 0-05 level; the latter would attribute significance to extremely low levels of 
correlation, particularly in the English sample. 

The expected pattern of positive correlation between the SSS and the P and E scales of the 
EPQ was found. Looking at the correlations between the SSS total score and the EPQ scores we 
can see that all are significant with the exception of the E vs. SSS correlation in the American 
males, which falls just short of significance at the 0-01 level. The P scale correlates rather 
consistently with all of the SSS subscales except Thrill and Adventure-seeking (TAS). The E 
scale correlates most reliably and highly with the Disinhibition (Dis) subscale, and somewhat 
lower with the TAS subscale, but the correlations with Experience-seeking (ES) and Boredom 
Susceptibility (BS) are quite low and often insignificant. 

As expected, there was little correlation between the N scale and the SSS. The correlations 
between N and the total score of the SSS were essentially zero in all four samples and only 3 of 
the 16 correlations with subscales of the SSS were significant. 

The L scale tended to correlate negatively with the SSS, particularly in the females of both 
samples. While most of the correlations between the SSS subscales and L were in the 0-2 to 0-3 
range, a rather high correlation of —0-47 was found between Disinhibition and L in the American 
female group. 


Discussion 

Sensation-seeking is a trait which falls between the E and P dimensions of Eysenck’s model. 
This finding is consistent with Zuckerman’s (1974) summary of the trait correlational literature: 
‘The general trait picture defines sensation seeking as an uninhibited, nonconforming, impulsive, 
dominant type of extraversion’ (p. 103). 

We may raise the question of whether the sensation-seeking trait can be accounted for entirely 
in terms of the E and P dimensions. The multiple correlations between SSS total and E and P 
are 0-52 and 0-59 for American males and females respectively, and 0-41 and 0-43 for English 
males and females. If we use the higher figures in the American sample it would appear that 
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between a quarter and a third of the variance in the SSS can be accounted for by the E and P 
factors. Since the N factor accounts for none of the variance, it would appear that, in some 
part, the SSS measures something that is not included in Eysenck’s three dimensions. 

The lack of relationship between the SSS and the N dimension is consistent with data using 
other general anxiety and neuroticism scales (Zuckerman, 1974). However, the SSS is correlated 
with trait scales measuring specific fears of physical harm (Zuckerman, 1974), and predicts (in an 
inverse relation) state fear reactions in certain classical phobic situations (Mellstrom, Cicala & 
Zuckerman, 1976). General anxiety and neuroticism scales measure generalized social fears but 
are not highly predictive of specific fears of physical harm. Low sensation-seekers generally 
avoid risks and when they cannot avoid they become anxious. But the tendency to avoid risks is 


not necessarily a neurotic trait. 


Despite prior work showing little or no relationship of the SSS with other lie and social 
desirability scales (Farley, 1967; Thorne, 1971; Zuckerman et al. 1972), the present study does 
show some correlation with the EPQ L scale. The correlations are quite low in males, but some 
of them are moderately high in females. Sensation-seeking, particularly that of the disinhibition 
type, may be considered more socially deviant in females than in males. The study on sexual 
attitudes and behaviour by Zuckerman et al. (1976) showed that college females in the 
post-sex-revolution generation continued to maintain much less permissive attitudes toward sex 
than males, even though they had as much experience as the males. It would follow that a 
female who is high on the lie scale would be less likely than a low scorer to admit the cynical, 
permissive attitudes typical of the disinhibition scale whether she actually agreed with them or 


not. 
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Noise and attentional selectivity: A reproducible phenomenon? 


Peter M. Forster and Arthur T. Grierson 





It has been suggested that loud noise increases attentional selectivity. Hockey (19705) found that noise 
improved performance on the high priority aspects of a complex task and found a corresponding impairment 
on the low priority aspects of the task. An alternative explanation was offered by Poulton (1976) who 
suggested that noise impairs performance by masking auditory feedback from subjects’ responses. Four 
experiments were carried out in order to investigate the phenomenon. No evidence of impaired performance 
was found in any of the four experiments, with or without auditory feedback. Thus, neither attentional 
selectivity nor masking of auditory feedback was found to be a significant factor in these experiments. 

It was concluded that this task is not suitable for investigating the effects of noise on attentional 
selectivity. 


The mechanisms underlying behavioural changes in response to high-intensity sound are not well 
understood despite an extensive experimental literature. This is in part due to differences in the 
type of noise used, in the nature of the task, and in measures of performance. These have 
produced conflicting and often contradictory findings. Noise can be shown to produce either an 
increment (McGrath, 1960), a decrement (Jerison, 1959), or have no effect on performance 
(Davies & Hockey, 1966). A further complication is provided by the various theoretical accounts 
of noise effects, which influence both the design of experiments and the interpretation of results. 

One approach is to view high-intensity sound as a general exteroceptive stimulant which acts 
to raise the level of activation in the so-called ‘reticular arousal system’ (Hockey, 1970 a). This 
view has some support from physiological (Anthony & Ackerman, 1955; Helper, 1957) and 
behavioural studies (Corcoran, 1962; Wilkinson, 1963) and allows application of the 
inverted-U-shaped arousal/performance function (Hebb, 1955; Malmo, 1959; Duffy, 1962) to the 
effects of noise. As the optimum level of arousal is held to be different for simple tasks than for 
complex tasks (Broadhurst, 1959), arousal theorists can account for the differential effects of 
noise in terms of task demands. 

Arousal theory, however, lacks predictive power; largely because those aspects of 
performance most sensitive to arousal have not been clearly defined. One possible explanation is 
that arousal affects performance by narrowing the span of attention. Easterbrook (1959), for 
example, suggested that emotional arousal caused a restriction in the utilization of incidental 
cues, a view which has some support from arousal-related studies involving danger (Weltman & 
Egstrom, 1966), anxiety (Zaffy & Bruning, 1966), heat stress (Bursill, 1958), and stimulant drugs 
(Calloway & Stone, 1960). By focusing attention on particular cues which may or may not be 
task relevant, changes in arousal can account for both enhancement and decrement in 
performance. Furthermore, since complex tasks require integration of information from a 
number of sources, they may be more susceptible to a high degree of selectivity than simple 
tasks with few relevant sources. In this way performance in simple and complex tasks might be 
differentially affected by changes in arousal, as predicted by the Yerkes-Dodson Law 
(Broadhurst, 1959; Hockey, 1973). Hockey (1969, 1970a) argued that high-intensity noise might 
produce similar changes in selectivity of attention, and began a systematic study of the effects of 
noise on the distribution of attention in a complex task. 

Using a dual-task situation based on that designed by Bursill (1958), Hockey (1970a) asked 
subjects to perform a pursuit-tracking task while simultaneously monitoring an array of six 
lamps for occasional flashes. Priority of the tracking task was emphasized by instructions, and it 
was found that noise increased overall tracking performance, increased central light detections, 
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but impaired detections of more peripheral light signals. A further experiment (Hockey, 19705), 
in which the distribution of signals across sources was manipulated, suggested that the 
improvement in central light detections was a function of the subjective probability of signal 
occurrence rather than simply their spatial location. Hockey concluded that under noise, 
attention is concentrated on high-priority sources of information and that this effect is a 
consequence of increased arousal. A further experiment (Hockey, 1970c), lent support to this 
conclusion when it was found that subjects deprived of sleep performed less well on the tracking 
task, and that the advantage of high-probability sources over low-probability sources in the 
subsidiary task was reduced. Hockey concluded that there is a monotonic relationship between 
arousal and attentional selectivity, and that this mechanism can account for the contradictory 
results found in the experimental literature on the effects of noise on human performance. 

More recently, Poulton (1976) has proposed an alternative explanation of Hockey’s results. 
Briefly, he suggests that the high level of noise masks the ‘click’ of the response switches, a cue 
which the subject can use in the ‘quiet’ condition to confirm that the correct part of the switch 
has been pressed. Under high-intensity noise the subject, deprived of this auditory feedback, 
adopts a strategy of keeping a finger over, or close to, the two central response switches which 
are used most often. As a consequence the more peripheral switches suffer most from the loss 
of auditory feedback, producing the pattern of results reported by Hockey. 

The present series of experiments did not begin as an attempt to set up a critical experiment, 
but represent a series of studies designed to explore the phenomenon of attentional selectivity. 
The first two experiments were performed to test the generality of the selectivity effect by 
manipulating the conditions under which it is presumed to occur, while Expts II and IV set out 
to confirm Hockey’s (1970 b) findings. 


Experiment I 

The re-allocation of attention under noise load has been found to favour high-probability 
components of a complex task when such sources are sited in the centre of the visual field 
(Hockey, 1970a, b). Although Hockey (1970 a) has shown that response latency changes to 
evenly distributed signals under noise load are independent of their spatial location, more 
conclusive support for the importance of signal probability would be to clearly demonstrate a 
re-allocation of attention to high-priority sources sited at the extreme periphery of the visual 
field. In this way any possible interference of visual with atientional changes would be 
minimized. This experiment is an attempt to demonstrate the reliability of attentional selectivity 
in noise not only when the signal bias is reversed, but also when the possible influence of 
auditory feedback has been eliminated. To this end, the distribution of signals presented in the 
monitoring task is biased in favour of peripheral sources (1 and 6), and silent response switches 
replace the audible microswitches used by Hockey. 

A further confounding factor in Hockey’s experiments lies in the fact that subjects are 
expected to respond to six switches with one finger. Clearly the motor response strategy adopted 
by subjects has a direct influence on individual response latencies and introduces an unnecessary 
element of variance into what is essentially an attempt to measure attention. For this reason only 
five response switches are used, with one for each finger. In this way the motor component of 
the response was reduced to a simple change in finger pressure. 


Apparatus 


The experimenters referred to Hockey (1968) for details of the task and used as much of the original 
apparatus as possible. This consisted of a pursuit-tracking display situated ın the centre of an array of six 
light sources set at angles of 20, 50 and 80° on either sidé of it. The display and the six lamps were 80 cm 
from the subject who was provided with a chin-rest. In the tracking display, a target pointer moved from 
side to side at a rate of 52 reversals per minute and the subjects’ task was to keep a second pointer aligned 
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with the target as much as possible by moving a handle up and down with their right hand. Instructions to 
subjects emphasized the priority of the tracking task. 

The secondary monitoring task consisted of responding as quickly as possible to the illumination of one of 
the six light sources. The lamp remained on until the subject pressed the appropriate switch. The five 
response switches were arranged so that each switch lay under the tip of each finger when the left hand lay 
on a central palm rest 1/2 in high. Five silent reed switches type RSA 7445 were used with spring pressure 
reduced to 35 g. Subjects were asked to respond to the lights with the corresponding finger and to use the 
centre switch for both lights 3 and 4. 

A total of 240 signals were presented during the task at random intervals and distributed between sources 
randomly subject to a peripheral bias in the ratio of 4:1: 1:1:1:4. 


Subjects and procedure 
Ten subjects from the APU subject panel aged between 17 and 50 years were split into two groups balanced 
for age and sex and tested in both ‘quiet’ (70 dBA) and ‘continuous noise’ (92 dBA) conditions in 
counterbalanced order. Each subject had passed an audiometric screening test which rejected those with 
hearing impairment as great as 35 dB in one ear, or 30 dB in both ears over the range 0-5-8 kHz. 

All subjects underwent a full 40 min practice session in ‘quiet’ at least two days prior to testing, and all 
test sessions took place between 5 and 8 p.m. with one week between each session. 


Noise 


Continuous broadband noise in the range 62:5-4 kHz was used, weighted in accordance with current hearing 
hazard recommendations (attenuated by 3 dB per octave up to 1 kHz). The sound pressure levels (SPL) of 
70 dBA (quiet) and 92 dBA (noise) conform to recent APU recommendations that a maximum noise rating of 
90 be used (equivalent to 92 dBA for 40 min). 


Results 

Performance measures. Tracking performance was calculated as the proportion of maximum 
possible time on target (TOT) for each 10 min period, and monitoring performance scored in 
terms of response latency to each source. For statistical analyses the Wilcoxon matched-pairs, 
signed-rank test was employed (after Hockey, 1968). 


Table 1. Mean TOT score in each time period (proportion of maximum TOT) Expt. I 


10 min period 

l 2 3 4 Mean Difference (1—4) 
Quiet 0-347 0-314 0-302 0-296 0:315 +0-051 
Noise 0-355 0-323 0-313 0-314 0-326 +0-041 


Tracking. Mean TOT for quiet and continuous noise conditions are presented in Table 1. There 
was a significant decrement in TOT over time in both quiet (P< 0-005) and noise (P< 0-01). 
However differences in TOT between quiet and noise conditions did not reach significance at 
any time period during the test or over the full 40 min. 


Monitoring. Table 2 shows response latencies to each source in quiet and noise. Mean central 
(sources 2, 3, 4, 5), peripheral (sources | and 6), and selectivity (central-peripheral) scores were 
calculated and presented in Table 3. 

There are significantly faster responses to biased (1, 6) as opposed to unbiased (2, 3, 4, 5) 
sources in both quiet (P< 0-005) and noise (P< C-005) at all times during the test. There are no 
significant differences however in response latencies to central and peripheral sources between 
conditions, and selectivity (CP) scores show no significant difference between quiet and noise. 
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Table 2. Mean detection latencies for each source over 40 min (msec) Expt. I 





Signal source 

l 2 3 4 5 6 Mean 
Quiet 654 805 736 TTI 826 646 741 
Noise 678 834 747 787 853 643 758 
Difference (N-Q) +24 +29 +1 +10 +27 -3 +17 


Table 3. Mean detection latencies for central (C) and peripheral (P) sources over 40 min (msec) 
Expt. I 


Signal source 


Cc P Difference (C—P) 


Quiet 786 650 +136 
Noise 805 661 +144 
Difference (N-Q) +19 +11 +8 





Considering responses to unbiased sources (2, 3, 4 and 5), there are significantly faster 
responses to central sources (3 and 4) than to other unbiased sources (2 and 5) in both quiet and 
noise conditions (P< 0-01). 

These results represent a failure to confirm the Hockey effect in either the primary or 
secondary task. A further experiment was designed to evaluate the effect of intermittent noise 
(IN) on attentional selectivity since IN, by preventing sensory adaptation, should be more 
arousing than continuous noise and should produce a greater effect on the pattern of attention. 
McGrath (1960, 1963) for example emphasizes the importance of varied stimulation in 
maintaining arousal. 


Experiment I 
Method 


Design, apparatus and procedure in this experiment are identical to that used in Expt I, apart from using 
eight subjects and intermittent rather than continuous noise. 


Noise 

Broadband noise (62-5~4 kHz) at 92 dB(A) was presented intermittently with quiet (70 dBA) periods. Noise 
bursts ranged from 0-6 to 5-4 sec in a random fashion with a mean duration of 3 sec. Inter-noise intervals 
varied randomly from 0-3 to 2-7 sec with a mean gap of 1-5 sec. 


Results 

Tracking performance. Mean TOT scores in Q and IN conditions are presented in Table 4. There 
was a significant decrement in TOT over time in both quiet (P< 0-005) and IN (P< 0-01). There 
were no significant differences in TOT between conditions at any time during the test or overall. 


Monitoring. Table 5 shows response latencies to each signal source in Q and IN. Mean central 
(2, 3, 4, 5), peripheral (1, 6), and C-P (selectivity) scores are calculated and presented in Table 6. 
As in Expt. I there are significantly faster responses to biased (sources 1, 6) as opposed to 
unbiased (sources 2, 3, 4, 5) signals in both Q (P< 0-05) and IN (P< 0-005). This confirms the 
expectancy effect on the pattern of responses found in Expt. I. Once again, however, there was 
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Table 4. Mean TOT score in each time period (proportion of maximum TOT) Expt. H 








10 min period 

1 2 3 4 Mean Difference (1 —4) 
Quiet 0-372 0-336 0-317 0-320 0-336 +0-052 
Noise 0-371 0:346 0:331 0-309 0-339 +0-062 


Table 5. Mean detection latencies for each source over 40 min (msec) Expt. II 








Signal source 

1 2 3 4 5 6 Mean 
Quiet 657 942 811 856 854 652 795 
Noise 680 874 824 856 920 644 800 
Difference (N-Q) +23 —68 +13 0 +66 -8 +5 


Table 6. Mean detection latencies for central (C) and peripheral (P) sources over 40 min (msec) 
Expt. II 


Signal source 

C P Difference (CP) 
Quiet 866 655 +211 
Noise 869 662 +207 
Difference (N ~Q) +3 +7 —4 


no evidence that the presence of IN changes this pattern of responses. No significant differences 
in response to central, peripheral or C-P sources occurred between Q and IN conditions. 

Responses to central unbiased lights (3 and 4) were faster than responses to other unbiased 
lights (2 and 5) in both quiet and intermittent noise (P < 0-05), thus confirming the tendency for 
central responses to be faster even though they occurred no more frequently than lights 2 and 5. 
This however does not necessarily constitute evidence for visual ‘funnelling’, for responses to 
these lights took place via a single switch. Although the bias in signal presentation was in the 
ratio 4:1:1:1:1:4, the response procedures produced an effective bias of 4:1:2:1:4 per switch. It 
is possible therefore that the expectancy effect on response latencies seen in both experiments is 
based on subjects’ awareness of their response pattern rather than the stimulus pattern. 

These results show that intermittent noise is no more likely to produce a change in attentional 
selectivity than the continuous noise used in Expt. I. Since the generality of the phenomenon of 
attentional selectivity has not been confirmed by these experiments, it was concluded that 
further investigation must follow the original paradigm manipulated by Hockey. In the following 
experiments, subjects were required to respond to centrally biased signals via six switches using 
one finger only, for it may be that a strategy of responding with five fingers reduces the demands 
of the task to such a level that no shifts in the distribution of attention are necessary ~ even 
under a high noise load. 
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Experiment MI 

This experiment is an attempt to demonstrate changes in attentional selectivity in the absence of 
auditory feedback from responses. Poulton’s hypothesis (1976) predicts either an improvement 
overall in noise (because of its arousing properties), or no difference between conditions. If 
noise increases attentional selectivity, the pattern of performance on the secondary task should 
change between conditions; under noise subjects will show an improvement in central source 
monitoring and a corresponding impairment in responses to peripheral sources. 


Method 


Subjects were eight members of the APU subject panel who had volunteered to take part in experiments 
involving exposure to noise. There were five women and three men in the age range 21-41 years, and all had 
passed the standard audiometric screening test. 


Noise 
Noise was identical to that used in Expt. I. 


Apparatus 

The six reed switches were set into a response board such that only the response button protruded above the 
level of the base. No palm-rest was used. Subjects were asked to respond to signals by pressing the 
appropriate switch with their left index finger and without looking at the switches. Signals were presented 
randomly in both time and location, subject to a central bias, at an average rate of six per minute over the 
40 min of the task. The distribution of signals over the six sources was in the ratio 1:1:4:4:1:1. 


Design and procedure 


The experimental design and procedure did not differ from that used in Expt. I apart from the changes 
described above. As before, the Wilcoxon matched-pairs, signed-ranks test was used for all analyses of 
results. 


Results and discussion 

Tracking. Mean TOT scores are presented in Table 7 for noise and quiet conditions. The decline 
in TOT over time is significant in both quiet (P< 0-01) and noise (P< 0-05) conditions. Although 
there is no significant difference in mean TOT scores between conditions, it is noteworthy that 
the tendency is in the opposite direction to that predicted; on average, subjects score less in 
noise than in quiet. 


Table 7. Mean TOT score in each time period (proportion of maximum TOT) Expt. III 





10 min period 

1 2 3 4 Mean Difference (1-4) 
Quiet 0-356 0-345 0-344 0-323 0-342 +0-033 
Noise 0-361 0-342 0-320 0 331 0-339 +0-030 


Monitoring. The mean detection latencies for each signal source in quiet and noise conditions are 
presented in Table 8. There is no significant difference in mean detection latency between 
conditions at any time period or over the full 40 min of the task. However if we consider the 
four central sources 2, 3, 4 and 5, there is a significant reduction in response latency to these in 
time periods two, three and four (P< 0-05). The relevant data are presented in Table 9. The 
increase in latency to sources | and 6 was not significant. These results are not compatible with 
the attentional explanation of noise effects proposed by Hockey (1970 b). The most obvious 
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Table 8. Mean detection latencies for each source over 40 min (msec) Expt. III 


Signal source 

1 2 3 4 5 6 Mean 
Quiet 1095 1222 845 806 1196 1175 1057 
Noise 1126 1156 816 747 1122 1184 1025 
Difference (N-Q) +31 —66 —29 -59 -74 +9 -32 


Table 9. Mean detection latencies for signal sources 2, 3, 4 and 5 in each time period (msec) 
Expt. II 


10 min period 

1 2 3 4 Mean 
Quiet 959 1014 1083 1017 1018 
Noise 961 946 987 950 961 
Difference (N-Q) +2 —68* —96* -67** —57 


* P<0-05; ** P< 0-02. 


problem is the improvement in monitoring performance on sources 2 and 5 when the signal bias 
favours only sources 3 and 4. Clearly there is no evidence to suggest that loud noise produces a 
shift in the amount of attention paid to high-probability and low-probability components of the 
task. As in Expts I and II there was no evidence of an improvement in tracking performance in 
noise. These findings will be considered further in the general discussion. 

From the three previous experiments it is clear that the phenomenon of attentional selectivity 
is difficult to demonstrate although it must be borne in mind that each study has differed from 
Hockey (1970 b), especially with respect to the response switches used. A final experiment was 
undertaken to determine the extent to which such differences in procedure may be responsible 
for the failure to produce the expected pattern of results. 


Experiment IV 


In this experiment the apparatus, instructions and procedures were as similar as possible to that 
used by Hockey (19705). According to both attentional and auditory masking hypotheses, this 
experiment should demonstrate a change in the pattern of responses between quiet and noise 
conditions. 


Apparatus 


The apparatus consisted of the original equipment used by Hockey (19705) and described in Expt. I. This 
included six mtcroswitches whose operation could be clearly heard in the ‘quiet’ (70 dBA) condition. Each 
subject was asked to respond to the signals using his left index finger only, and the distribution of signals 
over the six sources was in the ratio 1:1:4:4:1:1. Cam speed on the tracking task was re-set to 14 rev/min 
(from 1'/, rev/min) to conform with Hockey (1968) giving a tracking rate of 69 reversals per minute. 


Design and procedure 


Design, procedure and noise conditions were identical to those described in Expt. I. Eight subjects, five 
women and three men, aged between 18 and 39 years, were used and all had passed the standard audiometric 
screening test. All test sessions took place between 2 and 5 p.m. with one week between sessions. 
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Table 10. Mean TOT in each time period (proportion of maximum TOT) Expt. IV 





10 min period 

1 2 3 4 Mean Difference (1—4) 
Quiet 0-353 0-333 0-314 0-314 0-329 0 024 
Noise 0-364 0-343 0-320 0 327 0-339 0-025 


Table 11. Mean detection latencies for each source over 40 min (msec) Expt. IV 








Signal source 

I 2 3 4 5 6 Mean 
Quiet 1400 1448 1163 1186 1436 1250 1314 
Noise 1293 1375 1124 1159 1395 1256 1267 
Difference (N—Q)  -107 -75 —39 -27 -41 +6 —47 


Results 

Tracking. Tracking performance was calculated as the proportion of possible time on target for 
each 10 min period. Mean TOT scores for quiet and noise conditions are presented in Table 10. 
There was a significant decrement in tracking performance over time in both quiet (P < 0-005) 
and noise (P< 0-01) conditions. There were no significant differences in TOT between conditions 
in any time period or overall. 


Monitoring. Table 11 shows mean response latencies to each signal source in quiet and noise 
conditions. There were no significant differences in response latencies between conditions. As 
in previous experiments there were significantly faster responses to biased vs. unbiased sources 
(P< 0-001) in both quiet and noise conditions. 

These results show a marginal increase in performance efficiency under noise in both the 
primary and secondary tasks, as might be predicted from an arousal explanation of noise effects, 
although this increase fails to reach significance. There is no evidence that loss of auditory 
feedback or the mere presence of loud noise will produce the pattern of results reported by 
Hockey (19705). 


General discussion 


The results reported here are consistent across all four experiments. The effects of noise on a 
complex task are found to be minimal and clearly differ from those reported by Hockey (1968, 
1970.4, b). In all four studies, tracking performance at 70 dB(A) (‘quiet’) decreased significantly 
over the 40 min of the task, a decrement that was not arrested by the presence of continuous or 
intermittent noise. 

A further difference from the results reported by Hockey (1968) is in the overall tracking level. 
In general, Hockey’s subjects produced TOT scores in the region of 60-70 per cent TOT, 
whereas the present subjects score consistently less. One possible reason is that the tracking 
cam is driven at different speeds between experiments although on examination this seems 
unlikely to account for such a difference in scores. The present experimenters, for example, use a 
speed of 1% rev/min in Expt. IV which gives tracking scores around 33 per cent TOT, while in 
Expts I-III a speed of 1'/,; rev/min produced TOT scores marginally less. It would appear that 
only a considerable difference in cam speed could account for such disparate scores. If Hockey's 
quote of 114 rev/min is correct then the fault may lie in the mechanical condition of the 
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apparatus. For example, the recording electrodes which make contact when the two pointers are 
in line (on target), may have lost efficiency due to mechanical wear and consequently demand a 
greater degree of accuracy in alignment to register as time on target. In any case we do not have 
the information to equate accurately the overall specification of the machine which may well 
have been altered in some respect during the intervening years since Hockey’s work. 

The results on monitoring performance show a similar consistency. Contrary to prediction, 
there is no evidence that responses to high-priority sources of information are selectively 
enhanced under noise load. In the two experiments where the secondary task was made easier 
by allowing subjects to poise their fingers over the response switches, the presence of noise had 
no effect on the pattern of responses to peripheral signals. Only in Expt. III was there evidence 
that responses to central sources improved in noise, yet here the improvement occurred on the ° 
four most central lights, irrespective of signal probability. This result is difficult for both auditory 
feedback and attentional selectivity theorists to account for since both use signal frequency as a 
key factor in distinguishing between responses to central and peripheral sources. Nor can this 
result be considered as support for a general funnelling of vision under noise load — the results of 
Expts I and II show that noise will not reduce the faster responses to peripheral sources in 
favour of more central locations. 

Although masking of auditory feedback can theoretically account for the pattern of response 
changes reported by Hockey, the results of Expt. IV clearly show that while loss of feedback 
may occur it does not produce the particular response pattern predicted by Poulton (1976). In 
general our results are more consistent with those studies which have shown noise to improve 
performance by increasing the subject’s level of arousal (e.g. Corcoran, 1962; Wilkinson, 1963) 
although the arousing properties of noise may not be as robust as other studies (e.g. Hockey, 
1968, 1970.4, b) using this paradigm might suggest. 

There are two main factors which must be considered as possible reasons for the failure of 
these experiments to reproduce the phenomenon of attentional selectivity. The first is that we 
used older, mainly female subjects, whereas Hockey used young naval ratings. Although our 
sample may well be less susceptible to loud noise we feel it unlikely that this can account for 
our negative results. No individual subject in any of our experiments produced the predicted 
pattern of responses — irrespective of age or sex. A more plausible explanation is that the present 
subjects were exposed to 92 dB(A) whereas Hockey used 100 dB(A). It is clear that an 8 dB 
difference in sound pressure level is not insignificant, yet one might expect continuous or 
intermittent noise at 92 dB(A) to produce at least a tendency in the expected direction, especially 
since Hockey (1969), following Broadbent (1957, 1958), proposed a minimum SPL of 90 dB for 
the observation of noise effects. Although we must acknowledge the possibility of a threshold 
effect at higher noise levels, we would suggest to future researchers that changes in attentional 
selectivity under loud noise are not easily demonstrated, except perhaps at levels which generate 
anxiety in the subjects. 

Other studies support this general conclusion. Green (personal communication) carried out an 
experiment in which subjects faced a display of four dials, each with a pointer moving one step 
at a time, and in which the task was to detect ‘double jumps’. One dial was biased to make 
twice as many double jumps as the others. In sessions lasting 35 min the subjects performed the 
task either in quiet (70 dB(A)) or loud noise (100 dB(A)) which was of a similar range and 
weighting to that used in the experiments reported here. Noise had no effect on the distribution 
of detections between probable and improbable locations. Loeb & Jones (1976) in a dual tracking 
and monitoring task compared performance in quiet (75 dBA white noise); peak (136 dB peak, 
impact noise); and periodic impulse (105 dB(A) continuous recorded industrial noise) conditions 
for groups in which either tracking or monitoring was deemed the high-priority task. The effect 
of central, peripheral, or no bias in the distribution of signals in the monitoring task was 
ascertained. No effect of noise on monitoring performance was found and tracking performance 
was impaired. Finally, another experiment using Hockey’s original equipment was performed by 
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Poulton & Edwards (1977) in which a between-groups design was utilized to evaluate the effect 
of noise and heat on performance. It was found that noise produced an improvement in tracking 
performance and also in monitoring performance which was significant only for the two central 
sources with no corresponding decrement in peripheral monitoring performance. 

Considered with the results of the present experiments, these studies are more favourable to 
the idea that noise affects performance by a mild change in arousal than to the suggestion that 
such a change produces an increase in attentional selectivity. It is possible that attentional 
selectivity can best be investigated by manipulating treatments such as fear, danger, and 
stimulant drugs. Clearly some effects of noise are unreliable. 
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Attentional selectivity and the problems of replication: A reply to 
Forster & Grierson 


Robert Hockey 





Forster & Grierson (1978) have reported a lack of success in replicating the essential findings of the 
experiments of Hockey (19704, b). The present paper indicates a number of methodological differences 
between the two sets of studies, and suggests that the problems caused by these are complicated by an 
unnecessarily narrow interpretation of the attentional selectivity hypothesis. 


Forster & Grierson (1978) have reported a series of experiments aimed at testing the generality 
and reproducibility of the effects of noise on attentional selectivity, originally demonstrated by 
Hockey (1970a, b). In these remarks I do not wish to criticize this work, which appears well 
conducted and unbiased. Instead, I would like to suggest reasons for the discrepancies in the 
form of the data and in the interpretations placed on these by the authors, which have their 
roots in two rather different problems. The first concerns the notion of ‘selectivity’ itself, which 
I believe to be interpreted too narrowly by Forster & Grierson. Secondly, there are a number of 
problems about the nature of ‘replications’ which have not been overcome in the present 
experiments. Although these two issues are logically quite separate, they are, in fact, connected 
in the present case. Let me first offer some views on the nature of selectivity and changes in this 
process under noise. This will also serve as a context for the present discussion. 


The attentional selectivity hypothesis 


In its simplest form attentional selectivity implies that in a situation involving more than one 
activity (i.e. one in which attention is divided) the different components of the task are not 
attended to equally. This is the state of affairs assumed to exist whenever one activity is 
‘primary’ and others ‘secondary’. The conclusion that noise increases attentional selectivity 
(Hockey, 1970, b) is based on the fact that performance on the primary task (pursuit tracking) 
improves, while that on the secondary task (monitoring) does not. The complication is that, in 
those experiments, there was an additional change in the relative performance of subcomponents 
of the monitoring task. In both experiments, detection of the two most central sources 
improved, while that of the four more peripheral sources deteriorated, except in condition A of 
Hockey (1970 b), where the monitoring task showed an overall impairment. Hockey concluded 
that this difference was due to the presence or absence of a subjective bias in favour of central 
locations, which could be used as a basis for selectivity. The point is that this non-uniform effect 
on the secondary task is not an essential feature of the selectivity explanation of noise effects: 
indeed it was not expected in Hockey’s original experiment (Hockey, 1970a, pp. 31, 35). An 
improvement or no change in tracking, coupled with no change or an overall impairment, 
respectively, in monitoring, would have given rise to an equally valid interpretation of an 
increase in selectivity in those experiments. Clearly, either would have been an easier result to 
explain than the complex pattern actually obtained. Whether central signals are responded to 
more effectively in noise and peripheral signals less effectively will depend on a large number of 
factors: not manipulated in Hockey’s experiments. It would be out of place to go into any detail 
in this reply, though I would mention overall signal load, detectability of signals, physical 
proximity of sources, and sequential constraints in the occurrence of signals on different sources 
as important secondary task factors. The most important single factor, though, is the nature of 
the primary task itself. If subjects regard this task as slightly less or slightly more important than 
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they did in Hockey’s experiments, either more or less of the secondary task may be included in 
the ‘high priority’ area of attention. There is no simple formula: the task is too complex for this 
detailed pattern of results to be expected in even similar task situations. If, as we shall see, 
Forster & Grierson’s primary task may have produced such an effect, then the precise changes 
due to noise are not possible to predict. I will return to these issues later. 


The problem of replication 


Forster & Grierson (p. 490) specifically state that one of their aims was to replicate the main 
findings of the original experiments. Clearly, any such aim must attempt to reproduce, as far as 
possible, the major conditions and procedural details of the original. My belief is that they have 
conscientiously followed this objective, yet a number of differences, two major and at least two 
minor, remain. First, there is an obvious discrepancy in the specification of the noise condition 
itself. Although in both the original studies and those reported here the control (quiet) condition 
is 70 dBA, the noise condition in Forster & Grierson’s studies is only 92 dBA, as opposed to 
100 dBA in the original. As they quite properly point out, this change is forced upon them by current 
regulations concerning hearing risk due to exposure to noise (more severe than in 1965-66, when 
Hockey’s experiments were run). Nevertheless, this difference represents a substantial decrease 
in the amount of energy reaching the ear. It is possible that effects of noise on visual monitoring 
are measurable at this level, but it is near the absolute minimum at which such effects have been 
found in previous studies. The second important procedural change is the use of a much more 
difficult tracking task, giving only 30-40 per cent time on target, as against 60-70 per cent in 
Hockey's experiments. Forster & Grierson discuss this discrepancy in terms of changes in the 
speed of the cam, or in the sensitivity of the electrical contacts responsible for recording time on 
target. However, the pursuitmeter apparatus, devised by Dr A. Carpenter, had two 
interchangeable cams, giving different input frequencies in terms of pointer movements. Not 
only does one give performance levels approximately equal io those of Forster & Grierson’s 
subjects and one levels like that achieved by subjects in Hockey’s experiments, but only the 
latter (easier) version shows improvement in noise when tested without a secondary task (D. E. 
Broadbent, personal communication). It is extremely probable, therefore, that different cams 
were used in the two versions of the task. This is possibly the most crucial difference between 
the two sets of experiments, as I shall demonstrate later, since it changes the whole pattern of 
attention deployment in the task. The difficulty is that Forster & Grierson presumably were 
unable to determine the exact cam specification, because it was not reported by Hockey in any 
of his papers. (This was because, at the time, he was not aware that the second cam existed, 
just as the present authors also seem unaware of the existence of another cam.) Thus, they are 
not to be blamed for failing to replicate this feature of the task, though the discrepancy in 
tracking level in the two situations presents major difficulties for interpretation. 

Before discussing these, let me conclude this section by referring to two minor changes 
introduced into the experimental set-up. The difference between the subject populations in the 
two sets of studies is mentioned by Forster & Grierson. They used members of the APU subject 
panel, largely women: Hockey used naval ratings. I would agree that this should matter little, 
since Peter Hamilton and I have generally found effects of noise to be at least qualitatively 
similar across groups of naval ratings, students and housewives. The other change is the use of a 
chin-rest in all experiments, including the last one, which is set up to be ‘as similar as possible 
to that used by Hockey (1970)’ (p. 495). I would not wish to argue for some critical role of this 
innocent device, although it may certainly affect the pattern of eye-movements, and make head 
movements unlikely. In my own experiments subjects were encouraged not to actively search 
for signals by making head movements, though they certainly were able to, and sometimes did. 
This is the only change in testing conditions which could have been totally unavoided and it is 
odd that it was not. Even so, with these two minor changes, and particularly in view of the 


A reply to Forster & Grierson 501 


major changes noted earlier, it is clear that the total situation is sufficiently different from that 
used by Hockey for any critical comparison of findings to be impossible. Whatever the reasons 
for the changes the present studies do not meet the requirements of a replication study. Let me 
now look at the data of the experiments themselves, and show how the problems I have raised 
interfere with the process of interpretation of the data. 


Interpretation of experimental data 


In examining the data I will concentrate on Expts III and IV, since the first two are tests of the 
generality of the phenomenon, rather than its reproducibility. The idea of separating factors of 
priority and centrality in the secondary task by biasing the signal distribution towards the 
peripheral sources was one considered by me in 1966. After discussion with Donald Broadbent 
the idea was rejected as requiring too complex a strategy on the part of subjects. It was this 
consideration which in fact led Hockey, instead, to compare unbiased with centrally biased 
displays, the data of which form the basis of the present discussion (Hockey, 19705). Studies 
attempting to extend the range of applicability of the phenomenon, such as Expts I and II, or 
those of Loeb & Jones (1976) (cited in Forster & Grierson), in which monitoring is made the 
primary task and tracking secondary, are interesting and necessary, but they are not relevant to 
the issue of reproducibility. I specifically avoided both those approaches because I regarded 
them as changing the already very complex task situation too much to allow useful comparison 
between the different studies. 

Experiments II] and IV use the same instructions, signal distributions and response 
requirements as that of Hockey’s condition B. Comparatively silent reed switches are used in 
Expt. III, in an attempt to test Poulton’s (1976) hypothesis that the non-uniform effect of noise 
on the secondary task is a complex consequence of the masking of (audible) clicks made by 
pressing the microswitches used in the original task. Experiment IV reverts to these 
microswitches in a direct attempt at replication. First consider the monitoring data. In Expt. II 
the effect of noise is an enhancement of detection latency at the four central locations, coupled 
with a (non-significant) increase in latency at the two peripheral sources. The authors state that 
‘These results are not compatible with the attentional explanation of noise effects proposed by 
Hockey (1970 b)’: yet they are very similar in pattern to those obtained in that study. The only 
real difference is that the improvement occurs for the two (low-probability) intermediate sources, 
as well as for the two (high-probability) central sources. This is a problem only for a narrow 
interpretation of the attentional selectivity hypothesis. In addition to any formal manipulation of 
priority in terms of probability it is possible that sources which are spatially linked to these 
‘natural’ loci of attention may also attract increased attentional resources. (They are, for 
example, more easily sampled at a small cost of overt scanning than the most peripheral 
sources.) It is not necessary to suppose that some additional effect of ‘visual funnelling’ is 
operating, since such a strategy could be part of a priority-controlled attentional sequence, 
though the possibility is not ruled out. Indeed, the aim of Hockey’s experiments was not to show 
that the noise effect was totally due to attentional changes, only that spatial effects per se were 
not sufficient to account for the changes observed. Having said this I have to admit that the 
monitoring data of Expt. IV show little sign of selectivity changes, and do suggest that the 
phenomenon lacks the necessary robustness to be reliably observed in the dual-task situation. I 
will return to these points after discussing the effects of the change in tracking requirements in 
these experiments. 

In none of the experiments is there any reliable effect of noise on time on target, though it is 
afforded the highest priority by instructions. I have already mentioned that previous experiments 
using this version of the tracking task show no facilitation either, even though no monitoring 
requirements were involved. The problem in interpretation here concerns the relationship 
between what is specified as high-priority by instructions and what is treated as such by 
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subjects. Priority, as a psychological variable, has received little formal treatment, and its 
parameters are poorly defined. Clearly, however, it must be determined by at least two factors: 
current goals or instructions, and subjective assessments of utility based on experience. The 
latter factor, which is a joint function of the ‘value’ of certain outcomes and the probability of 
successfully achieving these outcomes, is the one which concerns me here. The difficult tracking 
task may fail to retain its high priority because of the low success rate experienced by subjects 
(what is the incentive for allocating maximal attention to an activity in which only 30-40 per cent 
success is achieved?). This argument is not entirely ad hoc either. It is precisely the subjective 
assessment of probabilities of success that Hockey assumed to be the basis of differences in 
priority between central and peripheral sources in the secondary monitoring task (Hockey, 
1970a, b), since these task components are not distinguished by instructional biases. 

What are the implications of this for the interpretation of Expts III and IV? The main point, I 
think, is that priority can only be used as an independent variable when it can be defined 
unequivocally in the task situation. If the tracking task is specified as ‘primary’ and subjects are 
achieving 60-70 per cent time on target, there is every reason to believe that it will retain its 
pre-emptive control of attentional resources, whereas 30-40 per cent may not be enough to 
convince subjects that it is such an important activity. As a result, any increased tendency to 
sample task information more selectively during the noise condition may be manifested in 
attention to other (formally secondary) aspects of the display. Differences between subjects in 
this reallocation (since there is no clear basis for it), and moment to moment changes in 
subjective assessment, will result, of course, in a variety of changes in the pattern of attention. 
These can only be interpreted in terms of selectivity changes if the priority rules are known — 
and they are not. 

Despite all these difficulties the fact remains that, in all four experiments, whereas noise 
generally seems to improve performance, the improvement is evident for only some and not all 
task components, while decrements are observed on some aspects of the task in all but the last 
experiment. Indeed, the data support the view that the effect of noise is to change the relative 
efficiency of different parts of the complex display, rather than being due to the non-specific 
arousal process which Forster & Grierson offer as their explanation for this varied set of data. 
Needless to say, the experiments offer no support at all to Poulton’s (1976) acoustic masking 
interpretation of Hockey’s original findings. Making the response switches silent produces an 
effect very much like the original, while (noisy) microswitches give a more general improvement 
in performance when subjects are deprived of this valuable cue. 


Concluding remarks 


The present set of studies therefore differs in a number of ways from those of Hockey. Even so, 
some of the data are quite consistent with the original findings, when the attentional selectivity 
hypothesis is examined more fully, while, given the highly probable differences in priority rules 
in the two situations, the remaining results cannot be interpreted. Added to this is the 
considerable reduction in noise level, which makes any effect less likely to be observed. 
Nevertheless, within the limitations of resources and information available to them, Forster & 
Grierson have conducted a number of careful studies, the results of which do at least suggest 
that the detailed phenomenon is not a robust one, and may not easily be generalized to 
situations which have only a superficial similarity. Perhaps the most important lesson from their 
work and from these comments is the need to specify the details of experimental method quite 
clearly. It is not always easy, of course, to know which methodological features are important 
and which trivial, but it is probably better to err on the side of overinclusion. 
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Attentional selectivity: A rejoinder to Hockey 


Peter M. Forster 


In his reply to Forster & Grierson (1978), Hockey (1978) raises two main issues to account for 
the discrepancies between the findings of Hockey (1970.a, b) and Forster & Grierson. These are 
that the latter authors interpreted the notion of attentional selectivity too narrowly and that their 
study is not sufficiently like that of Hockey’s to constitute a replication. Some reasons for 
thinking that Hockey’s arguments are not entirely adequate are offered. 

While concurring with Hockey when he points out that the simplest form of the attentional 
selectivity hypothesis merely implies that in a multicomponent task the different components are 
not attended to equally, which Hockey (1970a, b) demonstrates by showing that performance on 
the primary task improves while that on the secondary task does not; he has in the past 
appeared to support a stronger form of the hypothesis. Hockey (1970 b, p. 41) states, ‘The 
increase in selectivity with noise. . . thus seems best described as an enhancement of attention 
paid to sources already being given priority with a resulting withdrawal of attention from low 
priority sources. This appears to be true whether the strategies employed are based on 
differential instructions (primary vs. secondary task) or an induced expectancy (high vs. low 
probability monitoring sources).’ 

Evidence incompatible with selectivity based on induced expectancy has been discussed by 
Forster & Grierson. Further evidence on this point may be seen in Expts III and IV where, in 
the monitoring task, priority (as described by Hockey) favours signal sources 3 and 4 and yet the 
improvement in noise is greater for sources 2 and 5. Suggestive evidence incompatible with the 
weak form of the selectivity hypothesis also exists. For example, in Expt. IV there was a small 
improvement in tracking performance and also a small overall improvement in monitoring 
performance in noise. 

Forster & Grierson have discussed the problem of replication and with respect to the 
difference in sound pressure levels there is no more that can usefully be added. The problem of 
tracking merits further comment. I now accept that there were two cams, each giving different 
reversal rates, and it seems likely that Hockey used the slower cam and Forster & Grierson the 
faster cam. However, it is possible to vary the speed of rotation of the cam which is an 
important point when considering the experiments discussed here and the experiment reported 
by Hockey as a personal communication from D. E. Broadbent. The latter experiment 
investigated the effects of noise on a single tracking task only. Two groups were used, one 
which performed the task at a rate of 33 reversals per minute, and a second group which 
performed at a rate of 66 reversals per minute. Only the former increased their time on target 
(TOT) scores in noise. Hockey suggests that the performance of subjects in the Forster & 
Grierson experiments is similar to that of the second group. There are reasons for doubting this, 
however. In the single task experiment, the overall TOT with the fast cam in the quiet condition 
is 0-239 of the maximum possible. In Expt. III of Forster & Grierson, with a reversal rate of 
52 min” and with a secondary task, subjects achieved a TOT score in the quiet condition of 
0-342, and in Expt. IV the corresponding TOT score with an increased reversal rate of 69 min™ 
was 0-329. There is a considerable discrepancy here which must be accounted for before the 
implications of the different reversal rates can be judged. Until an adequate resolution is 
achieved I must maintain that Hockey’s and our experiments were sufficiently alike for us to 
reasonably expect a pattern of results similar to Hockey’s if the phenomenon of selectivity as 
described by Hockey actually exists. 

Considering next the interpretation of data, Hockey states that the pattern of results obtained 
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by Forster & Grierson, Expt. HI, is very similar to that obtained by Hockey (1970 b), and that 
the incompatibility is only a problem for a narrow interpretation of the selectivity hypothesis. I 
have nothing to add to the points mentioned above and will merely suggest again that it is this 
narrow interpretation which Hockey appeared to support, and from which his current position is 
a considerable departure. 

The next important point considered by Hockey is the question of ‘priority’ with respect to 
the tracking task. In a rather speculative account he suggests that our tracking task failed to 
retain priority because of the ‘low success rate experienced by subjects’. In none of the 
experiments reported either by Hockey (19704, b) or by Forster & Grierson, are subjects given 
feedback about their tracking performance and, in fact, there are no adequate grounds for 
thinking that they experience a low success rate. 

The last important point that I would like to make is that the data of Forster & Grierson do 
not support the view that ‘the effect of noise is to change the relative efficiency of different parts 
of a complex display’. Some reasons are given above. Nor do Forster & Grierson actively 
support a non-specific arousal process; our results are more compatible with such a hypothesis 
by default only. 

The major thrust of this rejoinder is thus, that the experiments of Forster & Grierson bear 
more than ‘a superficial similarity’ to those of Hockey and therefore their main conclusions must 
still stand, i.e. changes in attentional selectivity under loud noise are not easily demonstrated, 
and that this task is not suitable for investigating such effects as may exist. 
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The structure of mathematical ability 


W. D. Furneaux and Ruth Rees 





A ‘mathematics’ test of 69 items was administered to 255 craft and technician students in the 17~19 age 
range. The item/item correlations were analysed using both a principal components and a 
maximum-likelihood method. After varimax rotation, the same structure emerged from both analyses. The 
first two factors were each defined by sets of items heterogeneous in terms of content and mathematical 
processes. Nearly all other factors could be understood as simple content or process factors. All the 
Thurstone PMA scales, which had also been administered, loaded strongly on Factor II’, and on no other. 
The results confirm and extend those from a previous study (Furneaux & Rees, 1976), and strongly support 
the view that there is a ‘mathematical ability’ factor independent of ‘g’. They also suggest an explanation for 
the equivocal results from studies of the structure of mathematical ability which derive from inter-test rather 
than inter-item correlations. 


In a previous paper (Furneaux & Rees, 1976) a brief review of some of the literature on the 
structure of ‘mathematical ability’ illustrated the discrepancy between such results as those of 
Hamley (1935), who stressed the overriding importance of ‘g’ in the determination of 
mathematical ability, and those of more recent workers such as Lee (1955) and Wrigley (1958) 
who claimed to have demonstrated an important group factor of ‘mathematical ability’ over and 
above the general factor, and separate from the number factor. 

In considering this discrepancy the present writers concluded that ‘the exploration of the 
dimensionality of mathematical ability has to be approached via the factorial analysis of 
inter-item, rather than of inter-test correlations’. Factor analytic studies of the inter-item 
correlations within a specially constructed 50-item ‘mathematics’ test were then reported. These 
had been conducted, separately, for (a) O-level secondary school pupils, (b) craft and technician 
college students, (c) college of education students, and (d) university undergraduates (total 
n= 1936). These analyses demonstrated the repeated appearance of two main factors (following 
varimax rotations). 

One of these (tentatively labelled a mathematical inference factor) was defined by virtually the 
same subset of 11 items, in each of the groups studied. These particular items had also been 
shown to be of interest in a previous study, in which they had turned out to be appreciably more 
difficult than was expected by teachers, who had judged the facility values of the other test 
items with reasonable accuracy. It must be stressed that although many of these ‘inference’ 
items did tend to have fairly low facility values in this previous study, it was the fact that they 
were more difficult than expected, rather than difficult in any absolute sense, that appeared to be 
the defining attribute of the whole subset. It should also be noted that these ‘inference’ items 
were the same items as were tentatively labelled as ‘core’ items in even earlier publications (e.g. 
Rees, 1973). 

The other rotated factor accounted for more of the variance than did the inference factor, in 
all the groups studied. In terms of the items defining it, however, it was slightly less stable over 
the groups tested. No attempt was made to label it in the previous paper, but it will initially be 
referred to as a mathematical knowledge factor in this paper, to facilitate discussion leading up 
to a suggestion for a more appropriate designation. 


The present study 


The study now to be reported was designed to provide a partial replication of the previous 
investigation of 69 test items, but using a larger set. As in the original test, a wide variety of 
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items was included, but a substantial number were designed to have characteristics judged to be 
typical of the previous inference items, whilst others were expected to load on a knowledge 
factor similar to that previously defined. Some items had very close counterparts in the original 
test, but many did not. 

In addition to this new test, the Thurstone Primary Mental Abilities Battery for grades 9-12 
(Science Research Associates, 1962 revision) was administered to all subjects, so that the 
mathematics items could be mapped into a test space containing the PMA variables as reference 
points. 

In selecting the group of subjects to be studied, it seemed sensible to aim for relative 
homogeneity in respect of both intellectual ability in general, and mathematical ability in 
particular. This was expected to facilitate the emergence and identification of the Thurstone 
group factors, as well as possible mathematical group factors, within an orthogonal reference 
system. The sample tested therefore consisted of 159 students taking City and Guilds Craft 
courses, together with 96 taking City and Guilds Technician courses. The 255 students were 
divided between two technical colleges, with an age range of 17-19 years. 


Analysis 

The scores obtained by each subject on the several PMA subtests, and on each item in the 
mathematics test (right/wrong) were subject to analysis using the SPSS program Principal 
Factoring with Iteration. A program using the Joreskog unrestricted maximum-likelthood method 
was also applied to the same data, to see whether the two different models would lead to 
different structures emerging. Varimax rotation was performed on both solutions. 

The SPSS program is very popular, but a certain amount of research and inference have to be 
employed if the technical characteristics of the PA2 package are to be made explicit. It appears 
that an initial communality estimate is first made for each variable, based on its squared multiple 
correlation with all other variables. With these initial estimates in the diagonal, the matrix is then 
first completely factored (without iteration) by the principal components method (i.e. as many 
factors as there are variables emerge). The common factor space is then taken as being defined 
by those v factors which satisfy Kaiser’s criterion. From this point on, the program uses an 
iterative procedure to produce an eventual principal components solution which terminates after 
the vth factor has been computed, and in which successive estimates of the communalities are 
based on the immediately preceding factor estimates. The amount of variance attributed to each 
factor is expressed as a percentage of the total variance over all v factors, not as a percentage of 
the total variance in the original matrix. 

The Jéreskog package does not diagonalize the matrix. Starting with the total correlation 
matrix it operates on the basic log. likelihood function, maximizing it by a routine arithmetical 
procedure. 


Results 


No comparison of the results obtained from the two different methods of factor analysis will be 
attempted in this paper. Apart from some points of detail, the two sets of rotated factor matrices 
revealed the same basic structure. The results using the SPSS package will be used for purposes 
of discussion, since they are directly comparable with those reported for the earlier study 
(Furneaux & Rees, 1976). 

The SPSS package presented an unrotated solution in 23 factors (see ‘Analysis’ above) after 
iteration. These factors accounted for 49-6 per cent of the total variance in the original matrix ~ 
a not unreasonable figure, bearing in mind that 69 of the 73 variables involved were test items, 
and not tests. The first factor received positive loadings from all variables, except for two 
mathematics items which had minute negative loadings. It accounted for 31-3 per cent of the 
variance in the 23-factor model. It is of interest that, although all the mathematics items had 
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Table 1. SPSS factor loading matrix for rotated first six factors (n = 255) 


Item F% 
3 91-0 
4 72:9 
7 34-5 
8 729 

11 514 

17 42-7 

18 17-6 

19 52:5 

20 20-4 

2i 22-0 

22 32:2 

24 63-1 

25 25-9 

26 50-2 

27 74-1 

30 87-1 

31 24:3 

32 22:0 

33 65-1 

34 13-7 

37 44-7 

38 25 1 

44 118 

45 15-7 

10 82 0 

13 573 

28 34-1 

29 63-9 

39 80-4 

4] 52-2 

48 58-0 

50 71-4 

51 71-0 

54 33-3 

59 40-8 

60 40-0 

62 72-2 

64 38-8 

66 50-6 

68 46-3 

69 43-9 

Vv 

N 

R 

S 

Note: 


F 
(22:5%) 


0-73 
0-58 
0-70 
0 46 
(0:27) 


0-49 
0-70 
0-61 


0-63 
0-56 
0-35 
0-61 
054 
0-81 
0-78 
0:34 
0-63 
0-63 
0-63 
0-76 


I 
(16-5%) 


0-33 


0-30 


0-42 


0-43 
0-51 
0-38 
0-60 
0-40 
0 36 
0-36 
0 32 
0-52 


0-44 


0-35 
0-37 
0-46 
0-63 
0-73 
0-78 
0-62 


or 
(43%) 


0-44 
0-67 


0-56 


0-32 


(0-29) 





IV’ v’ VI’ 
(3 4%) (3 9%) (45%) 
0-60 
0-73 
0-57 
071 
0-54 
0-59 


The percentage vanance accounted for by each factor ıs shown in parentheses below the relevant column 


headıng. 


Facility value (F) is the percentage of students selecting the correct solution. 
Loadings less than 0-30 are omitted except for borderline items (11 and 45). 
In the ‘item’ column, V, N, R, and S refer to the Thurstone variables, Verbal, Number, Reasoning and 
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been carefully designed and were judged by competent mathematicians to be concerned with 
‘mathematical’ operations, 17 of them (i.e. nearly a quarter of all those in the test) had loadings 
of less than 0-2 on this first factor. 

Although all 23 factors had eigenvalues greater than unity before iteration, there was a very 
marked ‘scree’ effect, and only the first nine continued to satisfy the criterion after iteration. 
Between them, these latter accounted for 73-2 per cent of the total variance in the 23-factor 
model. The last four factors extracted (20-23) all had eigenvalues of less than 0-6. 

Varimax rotation within the 23-factor space produced a structure in which only the first six 
rotated factors satisfied Kaiser's criterion. 

Table 1 sets out the loadings of 41 of the 69 mathematics items, and of the four Thurstone 
PMA scores, on each of them. Loadings of less than 0-3 are not included. The 28 mathematics 
items which do not appear in the Table each had their only loadings on factors appearing later 
than the sixth. The seventh factor was defined by four of them, and no other factor attracted 
more than two. 

For each of the first two factors, the correlation between facility values and factor loadings 
was Calculated, and was also inspected graphically. For factor I’ there was no evidence of linear 
or non-linear relationship on graphical inspection, and the product moment coefficient was 0-21 
(n.s.). For factor II’, however, an approximately linear positive relationship appeared, with 
r=0-59. 

Discussion J 

The first point of note is the very wide dispersion of the mathematics items within the 
test-space. It has already been remarked that nearly a quarter of them had loadings of less than 
0-2 on the unrotated first factor. After rotation, about half of them emerged with all their main 
loadings on factors 7 to 23, none of which accounts for more than about 3 per cent of the 
common factor variance. In addition, eight of the 23 factors attract significant loadings from only 
one item each, and another seven from only two items each. Although all the items were very 
carefully designed, it is clear that about half of them are making no significant contribution to 
the assessment of any kind of underlying determinant which might be called ‘mathematical 
ability’. 

If factors defined by only one item each are ignored, then it becomes clear on inspection that 
virtually all factors beyond the first three are associated with fairly obvious similarities of 
content or process within the subsets of.items which define them. It is of interest that several of 
the rotated factors appearing after the sixth were very clearly of this kind, since in terms of 
Kaiser's criterion they would tend to be dismissed as statistical artifacts. The proportion of the 
total test variance associated with any such content or process factor could presumably be 
increased to any extent desired, by manipulating the relative numbers of different kinds of items 
within the test. Bearing this in mind, the importance of factors I’ and II’, which between them 
account (in the present analysis) for some 40 per cent of the common factor variance, lies not so 
much in the amount of the variance they are concerned with as in the heterogeneous 
characteristics of the items which load on them. As in the earlier paper (Furneaux & Rees, 1976) 
competent mathematicians have not been able to suggest either content or process attributes 
which could account for the generation of either factor. For these two therefore, and for these 
two only, it seems justifiable to seek explanatory principles of a relatively general kind, such as 
(for example) underlying abilities which play a major part in determining performance over a 
wide range of items. 

Factor II’ can confidently be identified as some kind of general intellectual ability factor, as all 
the PMA scales have substantial loadings on it, and on no other factors. It is perhaps not 
inappropriate that Number and Reasoning load a little more strongly than Verbal and Space, but 
quite a small further rotation of the vector would alter these relativities substantially. The 
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correlation of facility values and loadings within this factor, remarked on earlier, is of interest. 
Such a correlation may sometimes imply that the factor is a statistical artifact resulting from the 
use of tetrachoric type correlation methods on items whose facility values are predominantly 
very high, or very low. In the present case, however, all save two of the items have F values 
within the limits of 30 and 70 per cent, and high and low values are almost equally represented. 
In addition, there seems to be no reason why a factor arising as a statistical artifact should 
attract loadings of around 0-70 from the PMA variables. The existence of this correlation is in 
fact entirely consistent with the notions about the structure of ‘intelligence’ that one of us has 
set out elsewhere (Furneaux, 1960). The view expressed is that test items which require a 
relatively long period of time for a correct solution to be achieved involve ‘continuance’ (akin to 
persistence) as a determinant, in addition to the speed/accuracy interaction which influences 
success with all items. When a particular PMA battery is applied to a sample for which it is 
appropriate (as was the case in the present study) almost all the items in all the subtests can be 
solved quite quickly, so that continuance is not a determinant, and the general factor defined by 
such a PMA battery is almost entirely concerned with the speed/accuracy interaction. Factor II’ 
might therefore ligitimately be designated a speed/accuracy factor, rather than simply a general 
intellectual ability factor. If it were to be so designated, then it would be entirely understandable 
that, within the subset of mathematics items having loadings on factor II’, the highest loadings 
would be associated with the easiest items, since it would be these that involved speed/accuracy 
only, whereas the more difficult ones would also involve continuance. It is planned to check the 
validity of this interpretation by adding adequate measures of speed, accuracy, and continuance, 
to the variables used in the present study, and then repeating the analysis. Until this has been 
done, it is perhaps best to continue to identify factor II’ as some kind of ‘g’ factor. 

There is a considerable overlap of item characteristics as between the set defining factor II’ in 
the present study, and those of the several slightly different sets of items which defined the first 
varimax rotated factor which emerged in each of the subgroups studied in the earlier 
investigation (Furneaux & Rees, 1976). This suggests that the latter (tentatively designated as a 
mathematical knowledge factor in the introduction to this paper) was in fact a ‘g’ factor also. 

The interpretation of factor I’ seems clear. Many of the items loading on it are similar to the 
core items which loaded on the second varimax rotated factor (the mathematical inference 
factor) in the previous study. The remainder are items which were designed for the new 69-item 
test in the expectation that they would behave like core items. It seems reasonable to suggest 
that the previous second rotated factor and the present factor I’ are identical, and that their 
designation as inference factors should be retained. 

In the present study, the ratio of the variance accounted for by the ‘g’ factor to that 
accounted for by the inference factor is about 2:3. In the previous investigation this ratio varied 
between about 2:1 to 1:1, depending on the subgroup considered. There is little point in 
attempting to decide between the several interpretations which might be suggested to account for 
the much greater importance of the inference factor in the present results. What is perhaps more 
relevant is that this difference certainly suggests that the importance of factors other than ‘g’, as 
determinants of mathematics test performance, can vary considerably in ways which relate to 
the exact make-up of the test. 

Although there do appear to be recognizable differences between the characteristics of ‘g’ 
type and of ‘inference’ type mathematics items, these do not seem to be immediately obvious on 
casual inspection, either to mathematicians or to psychologists. It is therefore largely a matter of 
chance whether a researcher constructing a ‘mathematics’ test finishes with an instrument that 
measures largely ‘g’, largely ‘inference’, or some admixture of both. This hazard may provide a 
sufficient explanation for the long persistence of the controversy as to whether a group factor of 
mathematical ability, distinguishable from ‘g’, does actually exist. 

A further study has now been completed, in which students’ routes to the solutions of these 
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two kinds of problems have been monitored using language laboratory facilities. The detailed 
results will be made the subject of a further communication, but there does seem to be evidence 
that the two kinds give rise to recognizably different types of difficulty. The present findings may 
therefore have implications for the design of the mathematics curriculum. 

It may be of interest that professional mathematicians who have been approached by one of 
us (Ruth Rees) appear to have much greater difficulty in recognizing and defining the difference 
between ‘g’ items and ‘inference’ items than do mathematics students, or those with limited 
mathematical knowledge. It is possible that this inability to recognize a category of mathematics 
items having special characteristics when judged by students, is associated with the fact that the 
latter experience ‘inference’ items as being more difficult than their teachers would expect, 
whereas there is no such discrepancy in the case of ‘g’ items. 

So far as can be judged on the evidence so far available, a ‘g’-type mathematics item presents 
a clearly structured problem which can be solved provided that some standard programme of 
operations has been learned. An inference type item, on the other hand, requires the ability to 
conceptualize the problem in such a way that the relevant operations can first be identified, and 
then applied in proper combination and sequence. It seems surprising that such inferential ability 
should emerge as quite independent of ‘g’, although the relatively homogeneous characteristics 
of the samples so far studied may be partly responsible for this. 
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A recent commentary on the state of the art in cognitive psychology (Allport, 1975) concludes that the field 
is characterized by ‘an uncritical, or selective, or frankly cavalier attitude to experimental data; a pervasive 
atmosphere of special pleading; a curious parochialism in acknowledging even the existence of other 
workers, and other approaches, to the phenomena under discussion; interpretations of data relying on 
multiple, arbitrary choice-points; and underlying all else the near vacuum of theoretical structure within 
which to interrelate different sets of experimental results, or to direct the search for sigmficant new 
phenomena’. One would be hard put to come up with a more damning set of conclusions. 

If the overarching theoretical structure that is so urgently needed is to come from anywhere, it will come 
from the area of memory, currently the hottest and most sophisticated within cognitive psychology. The 
question is: is memory theorizing yet in a position to provide such a general theoretical framework? A good 
place to start looking is the new edition of Norman’s little introduction, Memory and Attention. 

The most striking thing about this second edition is how little resemblance it bears to the first edition. 
Clearly much has changed since 1969. The intervening years have seen the birth of semantic and episodic 
memory, of the depth of processing hypothesis, of the encoding specificity principle, of elaborative encoding 
and associative networks, of ‘memory for real-world events’ and models of language comprehension; and 
the demise of short-term memory, the paired-associate, interference theory and the whole verbal learning 
tradition so roundly attacked by Tulving & Madigan (1970). Things certainly look a whole lot more 
sophisticated. But is it a necessary and fruitful sophistication, or rococo speculation that runs so far ahead 
of the facts that its only justification is decorative rather than functional? 

It is undeniable that most of the models around bristle with Allport’s ‘multiple and arbitrary , 
choice-points’. In other words, each contains so many assumptions that are not required by existing data that 
they are not falsifiable by new studies. But whether this is necessarily a bad thing depends very much on the 
pretheoretical values that one holds. For some, parsimony is an essential constraint, while for others 
Occam’s Razor cuts very little ice. Marvin Minsky and John Anderson, for example, agree that ‘parsimony 
is still inappropriate at this stage, valuable as it may be in later phases of every science. There is room in the 
anatomy and genetics of the brain for much more mechanism than anyone today is prepared to propose, and 
we should concentrate for a while more on sufficiency and efficiency rather than on necessity.’ 

To criticize a cognitive theory for having more assumptions than are strictly justified is to misunderstand 
the nature of the game. What cognition needs - what Allport is calling for —is not a complete, coherent, 
accurate theory qua explanation of higher mental processes, but a language, a repertoure of concepts that is 
sufficiently rich to enable the formulation of interesting and testable propositions in a wide variety of areas, 
all of which spring from the same underlying cognitive view, and/which are therefore mutually compatible 
and mutually informative. To criticize a proposal such as ACT, the new theory spelt out by J. R. Anderson 
in Language, Memory and Thought as being non-falsifiable is like criticizing the English language for the 
same offence. The evaluation of such a ‘language’ depends on a slowly emerging and often implicit 
consensus about its usefulness. i 

The corollary of this is that the development of a theoretical toolkit is constrained by its usefulness in 
unlocking the target area, as much as by internal considerations of formal elegance, consistency and the like. 
Some of the approaches surveyed by Norman, of which his own ‘LNR group’ is the chief culprit, fail to see 
this, and take the temporary suspension of Lloyd Morgan’s Canon as an open invitation to a feast of 
theorizing in which rules, principles, concepts and assumptions pop up and vanish again at bewildering speed. 
What is needed ıs a fundamental set of theoretical assumptions that is full enough and clear enough to 
prescribe in some detail the avenues of development that the theory will take when applied to any particular 
area. What we have at the moment resembles a higgledy-piggledy shanty town which has developed on the 
basis of many, very parochial little decisions ACT, as we shall see, gets a lot nearer the ideal than any other 
current proposal, though I shall argue that it also fails quite seriously in the end. Before looking at ACT in 
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detail, though, some more general comments need to be made about the theoretical context in which it has 
appeared, as represented by the selections in Memory and Attention. 

The first point is, then, that while we must suspend a strict criterion of testability at the moment, we 
cannot do without the looser constraint of usefulness, and that some current memory ‘models’ have moved 
so far away from considerations of psychological performance that they appear to be simply talking 
themselves to death. 

Second, cognitive theorists are still remarkably lacking in common sense. The number of trees that have 
died in order to allow the psychological world to debate such subtle and tendentious issues as whether what 
people know influences what they learn (‘the elaborative encoding’ hypothesis of Bransford, Franks and 
others); or whether people's knowledge and expectations influence what they see (‘the data-driven’ vs. 
‘conceptually driven’ process dichotomy of Norman); or whether how deeply you understand something 
influences how well you remember it (the ‘depth of processing’ hypothesis of Craik & Lockhart), or 
whether the context in which you learn something influences how you store and retrieve it (the ‘encoding 
specificity hypothesis’ of Tulving), is an ecological disgrace. In all these cases the question ‘whether?’ is a 
non-issue. What matters is where, when and how these influences operate; and progress on those fronts is 
remarkably slow, not only because psychologists seem only able to cope with Yes/No questions, but also 
because their lack of ingenuity at inventing experimental paradigms means that the study of ‘where?’ and 
‘when?’ is severely cramped by the poor range of officially accredited situations from which one can choose. 

Third the psychology of memory remains stunted by the lack of a decent psychology of learning to partner 
it. Right from the word go the explanation of memory and of memory change must go hand-in-hand, for a 
mutable memory will be fundamentally different from an immutable one. The consequences of ignoring this 
are glaringly obvious in every existing model: learning is treated only as bolting extra bits on to a structure 
that may expand, but cannot modify. Yet a long time ago Piaget told us that the process of learning can be 
viewed from two indissociable and complementary perspectives: the input is interpreted and modified by the 
system on which it falls (assimilation) and in the act of receiving the input the existing system is itself 
modified (accommodation) When you retrieve a word’s meaning in the course of understanding a sentence, 
that meaning is itself subject to modification. Any discussion of this issue — which is actually the more 
general one of inductive learning — is conspicuous by its absence in Norman’s book. 

Fourth, it is impossible to ‘decouple’ (Reitman, 1970) memory from other cognitive activities such as 
perceiving, reasoning, problem solving and imagining. If you try to build a model of memory ‘pure’ and then 
apply it, you inevitably find that your nice neat garden suburb is turning into a shanty town. As before, your 
model of memory is a tool, and it has to be designed from the start with a job in mind. 

Physics has had to come to terms with the fact that what a system will tell you about itself depends on the 
way you phrase your question. Different tasks set for the same system may produce apparently 
contradictory pictures of the system. This ‘principle of complementarity’ needs absorbing into psychology. 
We would then not assume that, for example, a semantic judgement task (‘Do dogs have wings?’) gives us 
any more privileged information about ‘the’ structure of semantic memory than any other. ‘The’ structure as 
it appears in sentence comprehension, or in pattern recognition, or in concept formation, may be entirely 
different. 

The ignoring of purpose in designing models of memory, and the assumption that such models can contain + ' 
a component that is invariant across tasks, is reflected in another familiar kind of statement, examples of 
which recur frequently throughout Norman’s book: '. ..the purpose of long-term memory is to record facts 
about various things, events, and states of the world’; or ‘.. the organism is normally concerned only with 
the extraction of meaning from the stimuli’. If this were so, we, like Tolman’s rats, would remain lost in 
thought ~ but it is not. The goal of ‘forming a meaningful representation’ of the world is subordinate to the 
goal of doing something about it. And as long as cognitive modellers ignore their behaviourist colleagues’ 
concern with action and its consequences, their products will remain fundamentally incomplete. 

One welcome change in the last few years, incidentally, is the waning enthusiasm for large structural 
components (like ‘Primary Memory’ and ‘Secondary Memory’), which means there are rather fewer boxes 
to stumble over in the dark. It seems to me, however, as I am trying to make clear, that much of the 
thinking that motivated them persists. 

Fifth, a memory model is determined not only by the task it is being constructed to do, but also by the 
builder’s fundamental and often unconscious beliefs about the nature of man. One such belief which is often 
lurking but rarely exposed is that there is, separate from the cognitive system, or model of memory, or 
whatever, something called the ‘person’ or the ‘self’ that chooses, selects, evaluates and directs the 
activities of the system. This statement from Atkinson & Shiffrin, quoted in Memory and Attention, is 
typical. ‘The term control processes refers to those processes that are not permanent features of memory, 
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but are instead transient phenomena under the control of the subject’ (my emphasis). Who or what is this 
mysterious agent that is neither the memory, nor its control processes, but seems always to be around to 
temper the predetermined churnings of the memory machine with human wisdom, and to kick it when it gets 
stuck? Psychologists are always looking for ways of improving on common-sense: yet it is, I fear, only 
common-sense that prevents us from writing off this ghostly agent — by writing him into the machine. Despite 
the way it seems to us, there is no such‘ghost, and it is the job of a cognitive model to explain (a) how 
things really happen, and (b) where the illusion of the homunculus comes from. 

Sixth, what we do is controlled not only by the current state of the world and by the long-term structural 
attributes of the interpreting memory system, but also by the more transient state that that system happens 
to be in; and this is-heavily influenced by fluctuating needs and feelings - i.e. by input from within the 
organism that primes certain desirable consequences. Again, therefore, a model of long-term memory must 
contain ab initio the wherewithal to represent motivational and emotional influences. None of the network 
models about (and this includes ACT) have anything at all to say on this subject, and their silence is 
inadequate. {f 

Seventh, the failure to build memory models that subserve action in general, leaves the research in 1978 
still preoccupied with verbal performance. And although we can now discuss paragraphs and even whole 
stories, instead of just word pairs, the performance that is measured is still very much verbal remembering 
as an end in itself, rather than retrieval in the context of, and as a means to, significant action. Which 
brings us back to the previous point that what all the work on verbal learning tells us about is 
‘memory-as-it-appears-in-verbal-learning-tasks °’, not ‘memory’. 

Finally, the continuing lack of a general framework to stimulate research means that experimental work on 
memory is still very much ‘phenomenon-driven’. Somebody has a bright methodological idea — like 
Bransford & Franks’ original experiments on automatic inference drawing in sentence comprehension, or 
Collins & Quillian’s semantic judgement latency task - and immediately eager associate professors and their 
graduate students set to, to pull it apart by using three-syllable instead of two-syllable words, or introducing 
a 15 sec delay, or trying it on 12 year olds. This is due partly, I am sure, to what Sir Frederic Bartlett might 
have called the ‘effort after publications’. Everyone in psychology knows that trying too hard makes you 
less creative, but we don’t often realize the force with which this applies to our own research efforts. 

So much for the moment for the state of art: in general we must conclude that despite the high 
productivity in the memory industry the actual growth has not yet been sufficient to provide a framework 
that can cohere and direct research in cognitive psychology. Furthermore the lack of progress is not a result 
of not enough effort, but of the narrowness of terms of reference that researchers have implicitly accepted, 
and which have effectively nobbled the enterprise from the beginning. 

One possible exception exists, however — John Anderson, who in the breadth, depth, quality and sheer 
volume of his work is something of a Superman in memory research. If anything can give us what we need, 
ACT, his latest model, can. Anderson characterizes the ACT system thus: ‘Memory is a propositional 
network of inter-connected nodes. A small portion of this network is active at any one time. Activation can 
spread down network paths from active nodes to activate new nodes and paths. To prevent activation from 

' growing continuously there is a dampening process which periodically deactivates all but a select few nodes 

_ (on the Active List). There is also a set of productions which provides the system’s procedural component. 
The condition of a production specifies that certain features must be true of the active portion of memory 
and the actions specify certain changes to memory. Each production can be conceived of as an independent 
“demon”. Its purpose is to see if the memory configuration specified in its condition is satisfied in the active 
portion of memory. If it is, the production will “fire” and cause changes to memory. In so doing it can allow 
or disallow other productions which are looking for their conditions to be satisfied. It is assumed that there 
are “external interfaces ” that translate the input into network representations and which can translate the 
activation of network structures into responses. ACT provides no model of these interfaces but rather of the 
cognitive processing that intervenes between the interfaces.’ (p. 122). 

This small acorn ramifies into 530 pages of closely reasoned foliage, which includes arguments from 
artificial intelligence, formal semantics and propositional logic as well as quite a number of new experiments 
and computer simulations. John Anderson 1s a formidable and multi-talented researcher. However, it is the 
nature of the acorn that is crucial, and it is on this that I wish to concentrate my scrutiny. 

The first point to pull out of Anderson's thumbnail sketch is the distinction between declarative and 
procedural knowledge. Perhaps the most basic assumption in ACT is that there is a ‘passive’ memory, that 
contains ‘facts’ -a ‘knowing that’ memory — and an ‘active’ memory, called the production system, that is 
the functional, ‘know-how’ bit. These two systems have quite different structural, as well as functional 
attributes. The declarative part is the familiar propositional network, differing in detail, but not in 
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conception, from those of Collins & Quillian, the LNR group, or Anderson & Bower’s earlier HAM. The 
procedural part contains ‘if-then’ instructions, which are not organized with respect to each other, and 
which operate on the network to alter the pattern of its connections or of its activation. Anderson rightly 
stresses that this distinction is not empirically decidable, and that it derives from a strong ‘pretheoretical 
bias’ - or intuition - rather than from reason or experiment But throughout the book he continually 
equivocates about it, and I suspect that as a result of ACT, his intuition is a lot more uncertain than it was in 
Chapter 1. Chapter 6, for example, is prefaced with this quotation from Allan Newell: ‘The wrong way to 
conceive of it is with the Production System as the active net-interpreter, and with the semantic net as an 
associative data structure. The right way is that the production system is the associative structure itself.’ Yet 
the entire superstructure of ACT presupposes the ‘wrong way’, and Anderson as a result is confronted with 
decisions that are frequently arbitrary, sometimes strongly counter-intuitive, and nearly always of no 
psychological interest; for with Newell, I believe that the declarative-procedural distinction does not reflect 
psychological reality. Ail knowledge can be conceived of as being procedural. Declarative knowledge is 
simply that subset of procedural knowledge that can be ‘declared’. The activation of part of our knowledge 
leads to non-verbal behaviour, and the activation of another part leads to verbal behaviour. I can see 
absolutely no reason to suppose that the form in which these two parts are represented is different. It is 
difficult to see how it could be, for on different occasions the same knowledge may manifest itself as verbal, 
and as non-verbal action. And in theory, Anderson agrees ‘.. . although ıt might seem more appropriate to 
represent the knowledge that “George Washington was the first President of the United States” as a 
declarative statement, one could magine this knowledge embodied in a set of procedures which would 
enable performances of the various actions (such as verbal reports) which manifest knowledge of this fact’ 
(p. 117). Yet the agreement is grudging and it does not influence ACT. While it is undeniable that linguistic 
knowledge does have special features ~ the subject-predicate structure, truth values, its relatively 
purpose-free nature and the fact it is not usually acquired by practice, for example - Anderson would have 
done better to express this in terms of the nature of the linguistic production system, rather than as a 
fundamental difference in representational format. He could then have avoided — or at least answered in a 
much less ad hoc way — questions about how you decide whether any particular piece of knowledge is to be 
represented declaratively or propositionally, and how linguistically acquired information comes to control 
non-verbal behaviour. 

Part of the problem here is that much of the language and attitudes of models like ACT stem from 
computer simulation research, in which the program-store/data-base distinction has only very recently been 
called into question. If part of your knowledge is represented as ‘operations’, it is very easy to assume that 
another part is ‘operated on’. It is much less easy to see that the operations are operating on each other, in 
an endlessly parasitic fashion. But it is possible to preserve the logic while avoiding the trap if you cast your 
model not in electronic, but in physiological, terms. From a structural point of view, a ‘conceptual nervous 
system’ can be visualized as very like a (declarative) network. Yet functionally the routing of activity 
through the network is determined by the nature of the network itself. What in ACT is represented by the 
selection of a production which then operates on the network can be represented in a neuronal net by the 
way in which activity builds up at a choice point until one ‘neuronal-threshold’ is exceeded. The a prion 
levels of these thresholds, together with the structure of the net, are sufficient in themselves to determine 
how the pattern of activation will change. 

The second aspect of ACT that deserves comment is the distinction between nodes and links in the 
declarative network. Anderson treats these as separable, like the ‘atoms’ and ‘rods’ that one gets in school 
kits for constructing models of molecules. In doing so he is only following what is standard practice in 
contemporary memory modelling, but a neurophysiological perspective again demonstrates that such a 
distinction is unnecessary. Structurally we may say that a neurone comprises a cell body, an axon, and 
dendrites, but functionally it is all of a piece. 

ACT also has two distinct ‘strength’ parameters — one associated with the network, the other with the 
productions. But again if the network is viewed in conceptual nervous system terms, as determining its own 
destiny, without external influences, then a single strength parameter, which we might call the ‘excitabihty’ 
of a unit within the network. is sufficient. (I am not out to sell a neo-Hebbian model here, only to 
demonstrate that there are alternatives to ACT that bypass some of the problems that it makes for itself.) 
Incidentally, Anderson is very ambivalent about his strength concept — not surprising, as in 1972 he was 
arguing most strongly against the necessity or even usefulness of any such idea (Anderson & Bower, 1972) 

There is another way in which the node-and-link view runs into difficulties, and that is in assuming that 
concepts are atomic, and therefore relatively well defined and immutable This follows from Anderson's 


a basically associationistic stance. When words occur together in a sentence, the representation that results is 
_ one which links together the nodes that subserve the appropriate verbal concepts. There may be ‘elaborative 
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encoding’ in the sense that more nodes and links are built into the representation than are strictly required, 
but the ‘nodes’ themselves remain essentially unaltered by their interactions. ‘When using a concept in a 
particular proposition it is not necessary to write into that proposition all the information known about or 
defining that concept. Rather, just a pointer (token) is introduced to that concept (type) in memory where the 
definition of the concept is stored’ (p. 152). And, ‘An attempt is made in propositional networks to use just 
one node to represent each concept, individual, relation, proposition, etc.’ (p. 153). Results such as those of 
Tulving and his associates (e.g. Tulving & Thompson, 1973), showing that the representation of a word is 
influenced by its context (the ‘encoding specificity principle’) are interpreted as reflecting the selection of 
one of a number of discrete senses of the word that already exist. In other words, there is one node per 
concept, but there may be several, or many, concepts per word, and which is selected is influenced by the 
context. Encoding the occurrence of a familiar word is a matter of selection and association, not of 
modification or creation. 

Now this can work plausibly enough for effects where what we normally understand by the sense of a 
word is definitely changed — for example, from ‘the animal’s bark’ to ‘the tree’s bark’ (Light & 
Carter-Sobell, 1970). But Anderson uses the term ‘sense’ much more widely than this. In one of his 
examples, he lists as senses of the word bear, ‘A large angry grizzly bear chased a fox’ and ‘A black bear 
begged for food in the park’ (p. 384). While these sentences contain references to different types of bear, 
they do not, to me, represent different senses of the word ‘bear’. It is not surprising that, with the aid of this 
linguistic wriggle, Anderson can maintain his selection hypothesis. However, a little later, Anderson himself 
notes that this cannot apply to paragraphs, and elaborative productions have to be invoked. As there is no 
way of predicting, in ACT, when ‘elaborative productions’ will be used, they are free to appear whenever 
the selection hypothesis runs into trouble. Much of Anderson’s difficulty stems from assuming that natural 
language concepts are well-defined, and can therefore be treated as atomic. But they are not; they are fuzzy, 
representing clusters of features some of which are central, and some more peripheral. When a concept 
occurs in a context, the context serves to select that subset of features that is most appropriate, so that 
what is understood by the concept is partially determined by its context. The ideas are not simply associated, 
they modify each other. Any model of language comprehension which acknowledges this must concern itself 
with the details of the processes whereby these interactions between concepts occur, and must move 
towards a very different conceptualization of the representation of verbal information from that offered by 
John Anderson. This failure, coupled with the procedural—declarative distinction, are the two most serious 
faults in ACT, and they serve to set it off at high speed, and with great energy and precision, on the wrong 
trail. 

The remaining points are largely contingent on, and subordinate to, these first two. The first concerns the 
periodic dampening assumption, which illustrates two general features of ACT - that it is often ad hoc, and 
that it is still moulded too much by its childhood in the world of computers. The notion of a moving pattern 
of activation within the memory network is an important and powerful one. Anderson’s detailed development 
of this idea is one of his major achievements. It is also important to have a complementary notion of 
dampening, or deactivation, to prevent, as Anderson points out, the whole network from getting turned on. 
But why periodic dampening? ‘After D units of time, activation will be dampened throughout the network. 
This means that all links and all nodes not on the Active List are deactivated’ (p. 123). There is no sense 
that this assumption ‘grows out of’ the basic nature of ACT in any organic way. Dampening is needed, so a 
dampening assumption is written in, and the way it is written is determined more by programming than by 
psychological considerations. 

A further difficulty with the periodicity assumption is that it means that any link whose rate of activation 
takes longer than the D time units between dampenings can never become active. So in a fact retrieval 
task, as the strength of the correct link or link-sequence decreases relative to the combined strength of all 
possible links, so correct response latency should increase up to a limit beyond which performance 
completely breaks down. No such performance has, as far as I know, ever been observed. 

The final point about the bases of ACT is that it is separate from the perceptual and motor parts on the 
processing chain. ACT has to be presented with an encoded proposition, and the problem with this is not 
just that somebody else has to do the encoding, using their own, not ACT’s, intelligence, but that even if 
ACT were equipped with a perceptual parser and a response executor it would also have to be equipped with 
programs of some complexity to enable these different components to interact. And it is highly unlikely that 
these technical problems of interfacing correspond to anything the brain has to do. The pattern recognition 
and response execution systems do not just use memory, they are memories, and the knowledge that their 
interconnections represent cannot be decoupled from the knowledge that mediates between them. I have- AAT 
discussed this earlier with reference to Norman's book. ey 


Having looked at the foundations on which ACT is built, it is necessary to stand back and see how well-if ~ Dn 
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meets the general criticisms of memory modelling that were spelled out in the earlier part of this review. 
First ACT is not easily falsifiable. As Anderson says, models like HAM and ACT are ‘great lumbering 
dragons that one could throw darts at and sting, but which one could never seem to slay... ACT is only 
going to be rejected by repeated attacks that keep forcing reformulations until the point comes where the 
theory becomes unmanageable with all its patches and bandages’ (pp. 531-532). But his defence of this type 
of theorizing is detailed and persuasive, and I buy it. Though I also hope, of course, that this review may 
sting a little. too. 

Second, ACT does not in the main concern itself with the naive and parochial distinctions of earlier 
cognitive psychology. Some distinctions are needed, naturally. and I think that Anderson gets some of them 
wrong. But they are rightly seen as the bases on which detailed predictions can be founded, not as 
themselves being open to direct empirical test. Anderson’s concern with the details of the contents, contexts 
and demands of tasks, and his recognition that small, superficial changes may lead to large processing 
changes, are exemplary. 

Third, ACT scores fairly well on the ‘isolation of memory’. I have criticized it for beng decoupled from 
perception and response systems; but in being designed to cope with not only memory performance, but also 
language comprehension, generation and acquisition, problem solving and concept-formation it is greatly 
superior to any existing competitor. I am convinced that models of this degree of ambition — or pretension — 
are precisely what cognition needs, and that where ACT fails it often does so through not going far enough. 

Fourth, the issue of whether there ts hidden inside ACT somewhere a ‘person’ that nudges it when 
nobody is looking is not easily decided. Because the model is so detailed, there are many places for him to 
hide. Where I think he is to be found in ACT is in deciding how the nature of a task determines which 
higher order productions will be used to structure it, and guide the way it is conceptualized. But J also think 
that this ıs an omission of detail, not of principle. Knowledge of the state of ACT at any moment, together 
with the inputs to it from the internal and external environments, should be sufficient to determine exactly 
the new state that results. The fact that this cannot at present be done is not a serious failing. What would 
be serious 1s if such statements as ‘...the notion was emphasized of the subject selecting from the set of 
propositions stored with a concept a number of propositions with which to elaborate the presentation of that 
concept’ (p 405) were allowed to go unchallenged. It is not ‘the subject’ that selects, it is ACT that selects, 
and therefore ACT must be explicit about how that selection is determined. 

Fifth, ACT, like the other models discussed earlier, is too much preoccupied with the verbal, intellectual, 
rational side of life. It takes account neither of non-verbal purposes (it does not have to get things done in 
the real world, as we do) nor of non-verbal influences such as emotional and motivational ones. I do not see 
any natural way in which ACT could be broadened to include either of these considerations. It would be 
churlish to complain that John Anderson has not yet solved all the mysteries of psychology; but there 1s no 
reason why high-level models such as ACT should not be designed with at least half an eye on how 
developments in these directions might occur. 

Sixth, we have already seen that ACT attempts to be a model of memory change, as well as of memory, 
and that 1s to be welcomed. As I noted earlier, a memory that is designed to be mutable must be different 
from one that is not. Unfortunately the chapters that deal with this — particularly those on ‘Inferential 
processes and induction of procedures’ — are less elaborate and less convincing than those that focus on the 
use of memory. They read more lıke additions to ACT than the spelling out of aspects that have been there 
all along, but latent. There is, for example, no mention of either deductive or inductive learning in 
Anderson’s sketch of ACT that was quoted earlier. Finally, what of Allport’s charge about ‘a curious 
parochialism in acknowledging even the existence of other workers, and other approaches, to the phenomena 
under discussion?’ In fact Anderson stands up to this one pretty well. His reviews of areas of research like 
that on the encoding specificity principle are clear, concise, comprehensive, fair and accurate. Even if one 
has neither the time nor the inclination to study ACT in depth (and it requires considerable stamina to do 
so), the summaries of, and comments on, other people’s work are well worth reading. 

He does, though, only make detailed comparisons of ACT’s performance with one other model -HAM - 
and it 1s hardly surprising that ACT does better, as that is what it was designed to do. Anderson’s view is 
that it is up to Collins, Kintsch, Quillian, Norman or whoever to develop their models to the point where 
they provide viable alternatives to ACT, which, I suppose, is fair enough. He has done so much work in 
developing and testing ACT, both by simulation and experiment, that it is unreasonable to expect him to 
attack his own creation too hard. 

All in all. Language, Memory and Thought is a shining example to other researchers in its detail, its scope, 
its rigour and its honesty. Anderson's discussions of how he has changed his mind over the last few years, 
his explicitness about his biases and intuitions, and his willingness to admit to areas of current confusion or 
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impasse in his own thought are enviable. (There are a few exceptions to his non-defensiveness, conspicuous 
by their scarcity. An experiment in chapter 9, for example, whose purpose on page 349 is to test an ACT 
prediction about subjective confidence in inference evaluation, becomes, six pages later, after the prediction 
is not fulfilled, an experiment ‘not to test the ACT model but to obtain a better understanding of the nature 
of subjective error in syllogistic reasoning tasks’.) ACT strengthens me in my conviction that models of 
memory will form the heart of the ‘theoretical structures within which to interrelate different sets of 
experimental results, or to direct the search for significant new phenomena’ (Allport), but that such models 
must be designed from their inception for the range of explanatory tasks which they will ultimately be 
required to fulfil. And although ACT is, in my view, fundamentally misconceived, nonetheless it provides 
an impressive model of the sort of thing that Allport was calling for, and which cognitive psychology very 
badly needs. 

I have devoted a lot of space to Anderson because he deserves it. The issues with which ACT attempts to 
deal are the central issues of cognitive psychology. More than that, they are updated versions of 
epistemological questions that have been around for thousands of years. But now let us look at some of the 
competition, as represented by the contributors to Cofer’s The Structure of Memory and Estes’ Handbook of 
Learning and Cognitive Processes: volume 4, Attention and Memory. Both are edited collections, with the 
emphasis in Cofer being on reporting original research — both theoretical and empirical — and in Estes on 
reviews that ‘reflect the current status of the field’. 

The Cofer book is the more interesting and readable, both because the authors are arguing their own 
positions, and presenting new work, and because it is considerably shorter. Had Estes been a more active 
editor his book would have been more valuable. As it is many of the chapters try to do far too much and 
end up being rambling and stodgy in the extreme. In addition, while it would be a touch unfair to call Estes’ 
authors ‘yesterday’s men’, the views of Murdock, Craik, Shiffrin, Massaro, Wickelgren and Atkinson are 
not unfamiliar, and not unavailable. Many of them have aired their positions elsewhere quite recently (e.g. 
Craik & Shiffrin in vol. 1 of Frank Restle et al.’s rival series Cognitive Theory, Atkinson in two volumes of 
the Attention and Performance series, and so on), and their reviews of their field overall are no improvement 
on the many that have been rushed into print in the last few years. 

Cofer provides us with a mercifully brief and instantly forgettable introduction to memory: if it is true that 
we store schema-plus-correction then no correction leaves no distinguishable trace. The storage metaphor for 
memory is accepted almost without question. We do get ‘The structural idea is a convenient metaphor, 
perhaps more congenial to the thought processes of the investigator. . .than descriptive of actual memorial 
functioning in the individual,’ but this is not followed up, and reads like a de rigeur echo of Tulving’s 
disingenuous disclaimer about the ‘reality’ of the episodic-semantic memory distinction. It seems to be good 
form to make a token statement of disquiet these days, but not to take it seriously. The first substantial 
chapter comes from Gilmartin, Newell & Simon. Called ‘A program modelling short-term memory under 
strategy control’, it elaborates the familiar Newell/Simon concern with control processes. ‘We hold as a 
basic assumption that the behaviors humans will show in task environment depends on the strategies they 
use’. They present a computer simulation of short-term memory, called SHORT, that is designed to show 
how the variations in performance between tasks can be attributed to strategy changes, rather than to 
changes in such structural parameters as size of STM, or coding rate. SHORT contains some 12 primitive 
operations that can be combined flexibly in the service of any task. SHORT’s big advantage is that it 
specifies how this combination is to occur: it is not the ‘subject’ who ‘decides’, ‘selects’ or ‘constructs’, 
there is a general principle that the strategy chosen will be the one that performs the task best (i.e. with 
highest retention, least ‘processing, etc.). While real people do not always subscribe to the same principle 
SHORT mimics human STM performance quite well. 

Estes’ chapter, ‘Structural aspects of associative models for memory’, starts by raising some of the 
questions about the storage metaphor that we should have found in Cofer’s introduction, and goes on to 
review (again, thank goodness, briefly) the history of associative theories of memory, up to and including 
Anderson & Bower, as a prelude to a brief description of his own ‘associative coding model’. This is similar 
to Anderson & Bower’s, but improves on it by seeing recognition as a process not of retrieval of a single 
critical piece of information but of reactivation and recruitment of a diffuse memory trace that contains 
elements of both stimulus and context. This opens the way to establishing some links with neurophysiology, 
which he does. Meyer & Schvaneveldt’s ‘Meaning, memory structures and mental processes’ provides an 
excellent review of their many studies on semantic judgement and lexical decision tasks, which provide 
elegant support for the notion of spreading activation, or mutual excitability of detectors (‘logogens’) within 
an associative network, Kintsch, in his ‘Memory for prose’ gives us another bash at his propositional theory 
of semantic representations. It does not improve with age. Nor does the power of its predictions. ‘The 


520 Special Review Feature: Memory research 


interesting result was that reading times increased regularly with an increase in the number of propositions in 
the sentence base, although the sentences did not differ in the number of words to be read’. Wow! I just 
hope he considered carefully before rushing into print with this paradigm-busting piece of news. And worse 
is to come. After rocking us with the fact that more complicated sentences take longer to read, he follows 
up with the audacious claim that they are also harder to remember, | can’t go on: the excitement is too 
much. The stout-hearted and well-fortified reader is referred to the chapter for more body blows to his 
Weltanschauung. 

When Donald Norman is ‘off-duty’, as he is here with Danny Bobrow, in ‘On the role of active memory 
processes in perception and cognition’, he is very good value. He has the ability to say interesting, 
non-trivial things in a way that is not only clear but actually revives some of one’s flagging enthusiasm for 
the subject. Starting from the obvious but widely ignored truth that memory does not follow perception, it 
subserves it, Norman & Bobrow show how an associative memory structure with a superimposed, variable 
pattern of activity of limited magnitude can achieve the desired integration. Winograd (‘Computer memories: 
a metaphor for memory organization’) and Schank (‘The role of memory in language processing’) focus 
particularly on computers and computer languages as sources of ideas about cognition. But I’m afraid this 
particular metaphor is on its way out. Things like ‘The completion of the change of knowledge will enable a 
PTRANS to the new location so that 


FIND(LOC(X)) = KNOW(LOC(X))+PTRANS(TO LOC(X))’ 


used to fill me with excitement: now they just leave me feeling rather tired. 

The final chapter is by Cofer bimself, with Chmielewski and Brockway, and called ‘Constructive processes 
and the structure of human memory’. It concerns the way in which relevant knowledge is activated in 
memory during the reading of prose, but their main thrust is to show that ‘the consequences of that 
activation will depend strongly upon what the individual sees his task to be’, and thus to cause us to 
question the utility of the concept of a static structure of memory. The doubt expressed tachistoscopically 
by Cofer in the introduction, and rather more clearly by Estes, is here taken up in the context of some 
Bartlett-type data, in which the amount and type of material recalled varied considerably with task demands. 
Sadly both the data and the discussion are very hazy, and Cofer et al. do not trust themselves to do more 
than ‘warn against an irreversible decision opting for a conception of human memory that contains fixed 
structural arrangements’. All in all The Structure of Human Memory contains signs of dissatisfaction with 
the cognitive psychological status quo of the early *70s, and scattered bits and pieces of ideas about what to 
do. We are left with a view of ‘memory’ as a subject in transition, which is just what it is. 

Estes’ Handbook, as I have said already, is too long, dull and a bit old-hat. It contains chapters by Spear 
on ‘Retrieval of memories: a psychobiological approach’, Murdoch on ‘Methodology in the study of human 
memory’, Craik & Levy on ‘The concept of primary memory’, Shiffrin on ‘Capacity limitations in 
information processing, attention and memory’, La Berge on ‘Perceptual learning and attention’, Massaro on 
‘Auditory information processing’, Wickelgren on ‘Memory storage dynamics’ and Wescourt & Atkinson on 
‘Fact retrieval processes in human memory’. The two most interesting chapters are those that are slightly off 
the well-worn track of pattern recognition/attention/short-term memory (the central areas of the book) - that 
is to say that by Spear, which tries to integrate some of the work on animal learning with the Tulving 
perspective on verbal learning, and by La Berge on the organization and modification of recognition 
networks. But overall the contributions go very little way towards meeting the general criticisms of the field 
that I raised earlier. If this really is the current state of the art, I suggest we note its flaws, and take our lead 
for the future, despite its own inadequacies, from John Anderson’s magnificent and monolithic theory. 

GUY CLAXTON 
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Book reviews 


Parasuicide. Edited by Norman Kreitman. London, New York, Sydney, Toronto: Wiley. 1977. Pp. 193. 
£9.95; $19.00. 


This book is the cooperative effort of Dorothy Buglass (epidemiologist), Drs Trevor Holding, Peter F. 
Kennedy and Norman Kreitman (psychiatrists), and Alistair E. Philip (psychologist), who acknowledge the 
cooperation of several other past members of the MRC Unit for Epidemiological Studies in Psychiatry at 
Edinburgh. The book consists mainly of edited and updated versions of previously published papers. This 
volume can be highly praised and recommended. Every fact is fully discussed and documented in great 
detail. Only a few highlights can be given in this review to indicate its contents. 

First of all the route is described along which starting from ‘attempted suicide’ the concept ‘Parasuicide’ 
was reached. Parasuicide emerges as ‘a non-fatal act in which an individual deliberately causes self-injury, 
or ingests a substance in excess of any prescribed or generally required therapeutic dosage’. Accidental 
cases, cases suffering from alcoholic poisoning alone and incidents befalling drug users are not included in 
this study, which was carried out in the Edinburgh area; most cases had been seen in the Regional Poisoning 
Treatment Centre, but general practice studies were carried out as well. In relation to the epidemiology of 
parasuicide, some 20 per cent of cases had not been admitted to hospital. As their social characteristics 
were very similar to those of hospital cases, it was confirmed that in all cases of parasuicide there had been 
a good deal of social pathology. A depressive illness had been diagnosed in 40 per cent women and in only 
28 per cent of men, while personality disorders were found rather more frequently (in 53 per cent of men 
and 50 per cent of women). Alcohol had made a significant contribution in 48 per cent of men, but in only 16 
per cent of women. Attention is drawn to the recent marked increase of parasuicide, but this was found to 
affect mainly men belonging to the semi-skilled and unskilled socio-economic groups. In contrast to previous 
studies, personality disorders remained an important aetiological factor in the older age groups, and the 
presence in their case of circumscribed depressive and organic mental illnesses was only slightly higher than 
in persons of an earlier age. Among other findings of the ecological study, there were four times more than 
‘expected’ contacts with persons who had themselves at some time shown parasuicidal behaviour The 
previous results of psychological tests in parasuicidal subjects is reviewed. The tests used in the present 
study included Fould’s Symptom Sign Inventory, the Direction of Hostility Questionnaire and the 16 
Personality Factor Questionnaire. No ‘suicidal personality type’ emerged, but there was a general inability of 
parasuicidal subjects to form interpersonal relationships. Rather unexpected (at least to the reviewer) was 
the finding that parasuicidal subjects were extra- rather than intra-punitive, especially when a personality 
disorder was diagnosed in terms of the Symptom Sign Inventory; nor was ‘risk-taking’ associated with 
propensity towards parasuicide. 

In these subjects completed suicide was a comparatively rare later event, and by and large suicidal and 
parasuicidal persons differed markedly from one another. Further parasuicide acts occurred during the first 
year of the follow-up in 16 per cent of persons. This rate has apparently remained fairly constant over the 
years, and repeat parasuicides were found especially heavily loaded in terms of sociopathy, use of drugs and 
alcohol, unemployment, and low social class. A reasonably successful predictive scale for repeat parasuicide 
could be designed. In an important chapter on prevention stress is rightly laid on the reasons for the many 
sources of failure. To highlight only two points, over a quarter of parasuicides had been in contact with 
helping voluntary agencies, and more disturbingly over half in receipt of psychotropic drugs claimed that 
their problems had not been discussed with those prescribing them. An experimental approach to the 
prevention of repeated parasuicide was only very partially successful, but the insights gained should lead to 
further work. 

It is predicted that this volume will remain the standard work on the subject until it gives way to a further 
edition reporting progress. 

FELIX POST 


Personality and Adjustment in the Aged. By R. D. Savage ef al. London, New York, San Francisco: 
Academic Press. 1977. Pp. 186. £8.20; $16.00. 


In 1971 Savage and a group of his colleagues published a book on their findings concerning intellectual 
functioning in various samples of old people in Newcastle-upon-Tyne. The present volume covers the 
non-cognitive aspects of personality functioning as well as of adjustment in the same samples of elderly 
subjects. 
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The first 60 pages of the present volume contain a survey of the literature. Starting with a review of the 
personality measures used in elderly people by earlier workers, and ending with a discussion of self-concept 
and adjustment measures ın late life, an intervening section of psychiatric matters is more cursory and not 
very relevant, especially as some of the definitions and descriptions are repeated in a later chapter of the 
book. Unfortunately in this and other chapters there appear to be from time to time misplacements or 
omissions of headings. For instance, in the first chapter the section with the heading ‘Acute confusional and 
delirious states’ leads on directly without break to several pages in which justified doubts are cast on the 
psychiatrist’s kind of classification by contrasting it with concepts employed by social workers and scientists. 
This is followed by an account of response to psychiatric care under a new and separate heading, but what is 
in fact summarized in this section is the response of elderly persons without specified psychiatric problems 
to care within institutions, and the way in which this response and rehabilitation are related to personality 
factors. 

In the body of the book the workers report how they carried out separate investigations with different 
personality and adjustment measures on each of five random samples of persons over 65, three living in the 
community and two in institutions. The measures included the Minnesota Multiphasic Personality Inventory, 
the Maudsley Personality Inventory, Cattell’s 16 Personality Factor Questionnaire, the Tennessee 
Self-Concept Scale, the Life Satisfaction Indices, and the Eysenck Personality Inventory. The findings are 
fully presented and should be of great interest to research workers. 

Only the last of the studies reported in the final chapter headed ‘The structure of personality in the aged’ 
lends itself to brief summarization. The Cattell 16 PFQ (Form C) was administered to a small community 
sample, and the scores were subjected to cluster analyses and other statistical tests. Four clusters emerge 
and both profile analysis and discriminate function analysis confirmed significant differences between these 
four clusters. In this way four personality types could be differentiated. Their characteristics were briefly as 
follows: 54 per cent of this small sample of 82 subjects formed the statistically normal group, members of 
which were characterized by being more apprehensive than members of younger age groups, more suspicious 
of outside interference and not happy with changes. At the same time they were shrewd and deliberate. 
Sixteen per cent of the sample were self-sufficient and resourceful as well as proud of their independence. 
They seemed tough-minded, worldly and consistently stable. By contrast the other two groups showed clear 
evidence of deviation and maladjustment. An ‘introverted’ group (19 per cent) were oversensitive, often with 
a depressive outlook, shy, withdrawn and unadventurous. A more obvious ‘perturbed’ group forming 11 per 
cent of the sample were very suspicious and emotionally unstable, as well as given to irrational worries and 
anxiety. Almost all individuals in this group were on ‘blind’ examination by a psychiatrist and found to be in 
need of treatment. 

Finally, one can only echo the authors’ view that considerably more work, particularly longitudinal 
investigations of personality and integration of personality models with those of cognitive functioning, is 
needed. 

FELIX POST 


The Social Challenge of Ageing. Edited by D. Hobman. London: Croom Helm. 1978 Pp. 287. £8.95. 


This collection of essays, edited by the Director of ‘Age Concern’, is a multidisciplinary offering. There are 
chapters by a geriatrician, social worker, architect, and even a clergyman. The only possibly relevant 
discipline to be omitted from the list of contributors is psychology, although references to psychological 
work are included in some of the chapters. 

The standard of the offerings is variable and the book could probably have benefited from a firmer 
editorial direction. Nevertheless there are worthwhile things on offer. Havighurst’s basically statistical 
overview of ageing in Western society (1n this case mainly the USA) is complemented by a similar chapter on 
the East (mainly Japan). This reveals some interesting cultural differences. As compared to the West a very 
high proportion of Japanese elderly live with their children, usually with the family of the eldest son. In the 
rest of the book the sensible chapter by Brearley on social work with the aged stands out whilst those on 
health and psychiatric disorder are competent but a little pedestrian. The closest to psychology is Newcomer 
and Bexton’s chapter on ‘Ageing and the environment’ which is an overlong account of certain uninspiring 
and largely theoretical ideas on individual-environment interactions. 

Education (both in terms of teaching the elderly and educating the various professions about the elderly), 
the religious aspects and architecture complete the mix. Anyone with the slightest experience of working with 
the elderly will appreciate the necessity of well-designed accommodation. The architectural chapter could 
therefore have been very valuable. As it happens it is interesting but not directly relevant to the main thrust 
of the book. It describes an educational experiment carried out in the School of Architecture at Oxford 
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Polytechnic aimed at giving students a wider appreciation of the social situations and needs which might in 
some circumstances be dealt with by the design of an appropriate building. It just happens that the 
particular project involved accommodation for the elderly. The account deals with the administration of the 
project as an educational exercise and not with any design solutions that may have emerged at the end. 

It is tempting to be chauvinistic and deplore the lack of a specifically psychological contribution. This 
would have rounded the book off and made it more complete. The omission of psychology might reflect the 
fact that so few psychologists are involved in work with the elderly. The medical profession and the social 
services may be no more inherently enthusiastic about the elderly but they cannot totally ignore the 
problems presented. Applied psychologists, especially clinicians, are freer to stay away and commonly do. 

This is not a book that will appeal directly to psychologists. Despite its unevenness it does give some 
useful wider information about the elderly and their problems. Those psychologists whose work brings them 
into contact with the elderly from time to time might find some worthwhile background reading in its pages. 
EDGAR MILLER 


Self-care in Health. By John D. Williamson & Kate Danaher. London: Croom Helm. 1977. Pp. 216. £8.95. 


Self-care is a popular subject on two counts. Politicians and health administrators, worried by the rising cost 
and diminishing returns of high-technology medicine, are attracted by the idea of passing responsibility from 
the health service back to the individual. For different reasons, sociological critics of professional dominance 
are interested in trends that seem to favour the autonomy of the consumer. In fact, self-care is perhaps the 
commonest approach to illness and is likely to remain so unless the health service reaches unattainable 
heights of accessibility and effectiveness. Compared with such an ideal, the hazards of self-care, including 
undetected illness and misapplied remedies, are obvious enough. In the real world, these have to be set 
against the cost and iatrogenic complications of medical intervention. 

This book by a general practitioner and a health education officer (both of whom have research 
experience) attempts to explore some of the economic, political, ethical and educational implications of 
encouraging self-care. In the preface, its authors disclaim to have produced either an academic discourse or 
a do-it-yourself guide to disease prevention or self-treatment. Their objective is to widen the debate among 
health professionals, fringe practitioners and the lay public. The result is rather a mixed bag. 

To take the good points first, there are over 20 pages of bibliographical material which must include the 
great bulk of what is currently available (in English or its sociological approximations) about the 
effectiveness of traditional medical care and the alternatives. This is well deployed in chapters about 
deficiencies in current medical practice, self-medication, preventive health and professionalism, which both 
lay and professional students of health care will find useful for updating and reference. There are sensible 
comments about illness-awareness, interesting ones on folk-medicine and patent remedies, and Utopian ones 
about health education. In spite of some rather cavalier treatment of dates and statistics, this is all 
stimulating and readable. 

The book is less good when it moves to speculation. The core chapter (‘Towards a self-care policy’) is 
repetitive and diffuse. The writing varies from the racy (‘However, if anyone believes that medical 
education is easily amenable to change he may be in for a shock’, p. 151) to the impenetrable (‘The 
problem with much of self-care is that a good deal of it is based on the assumption that personal preference 
for a scientifically adequate manoeuvre is the paramount justification for its being undertaken’, p. 147). 
There is some wild and rather messianic generalization about, for example, the (‘revolutionary ’) social 
implications of an Open University course in medicine, which contrasts sharply with cautious conclusions in 
earlier chapters about the limited possibilities for self-care in low-technology medicine and self-monitoring. 
Too much of the material is ill-digested. In a particularly unsatisfactory passage about current trends in 
medical care research, the authors fail to mention the work of social scientists and others which they 
themselves cite elsewhere in the book. 

Given these doubts, the final conclusion, that a coherent policy of self-care would require the most 
fundamental reappraisal of health care and of doctor-patient relationships since the introduction of the 
National Health Service, seems somewhat overdrawn. But the authors deserve credit for opening up a 
difficult area. 

In a generous foreword, Dr John Fry suggests that the book should be compulsory reading for private 
families and individuals. At nearly £9 for 175 pages of text, this seems an unrealistic expectation. 

R. G. S. BROWN 
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Research to Practice in Mental Retardation. Volume I: Care and Intervention. Edited by Peter Mittler. 
Baltimore: University Park Press. 1977. Pp. xxiv+A-30+472. £17.50. 


To edit conference proceedings is no mean task especially, as in this case, the task was to publish about 
one-third of the papers presented at the Fourth Congress of the International Association for the Scientific 
Study of Mental Deficiency. The Editor and the members of the editorial board have, in making their choice, 
considered, amongst other things, the material u,aitable, the themes that were prevalent at the congress and 
the likely readership of the published proceedings. An individual reader might disagree with the selection but 
there is really no alternative to accepting the viewe of an editorial board and recognizing the fact that 
individual interest and taste will determine how a particular individual contribution is favoured. A list of 
papers that were given at the Congress, and have not been published, appears at the end of vol. I, together 
with the name and address of the author. The titles that whet the appetite can be requested. 

The theme of the Congress was ‘From Research to Practice’ hence the title of the book. This volume, after 
reporting the Opening, Closing and Presidential Addresses, is divided into ten sections as follows: Number of 
contributions per secticn in brackets: ‘Research and Policy’ (2), ‘Attitudes’ (3), ‘Ethical and Related Issues’ 
(5), ‘Epidemiology’ (2), ‘Early Intervention’ (12), ‘Working with Families’ (5), ‘Residential Services’ (13), 
‘Community Services’ (6), ‘Psychiatric Services’ (5) ard ‘Costs’ (2). Whilst the whole is directed to the 
person interested in the mentally handicapped, a large portion of this volume is related to issues that are of 
interest to psychology and the social sciences in general, e.g. the sections on ‘Early Intervention’ and 
‘Residential and Community Services’. Individual contributions obviously differ in their appeal but the 
majority make points that are worth noting The whole provides both food for thought on some of the 
fundamental issues in the area of care and intervention with respect to mental handicap and, examples of 
practice that could well be followed by the reader in his own practical situation. The only feature that 
annoyed was the first papers, possibly chairman’s remarks, in each of the sections on ‘Residential Services’ 
and ‘Community Services’. They appeared to dismiss points, quite rightly, that were brought up in the 
papers that followed. These papers might have been better placed at the end of the section. 

The theme of the congress rather than the title of this volume provided the set for my reading and 
throughout the theme appeared to be inappropriate. Research in Practice seemed to be more applicable to 
the majority of papers. This observation is supported by A. D. B. Clarke in his presidential address in which 
he refers to the gap between our knowledge and our practice. The papers might have provided a stimulus for 
contact between researcher and practitioner at the Congress itself but the reader of the proceedings is 
presented with many papers which describe research in practice in a practical/service setting rather than with 
papers that concern themselves with the application of research knowledge to the practical/service situation. 
Nevertheless, to undertake research at the service level is a commendable activity and should go far in 
uniting the researcher and the practitioner. But, there is still a need for knowledge gained by researchers 
independently of the practitioners to be made relevant to the service provision. 

For those working in the field of mental handicap, who are unable to afford to attend international 
congresses, the publication of the proceedings is valuable. It allows one to become aware of current 
developments and thinking in the field and provides a useful source of reference material. The cost of the 
whole proceedings is likely to be prohibitive for the individual buyer but the volumes are worthy of purchase 
by libraries, and units for providing services to the mentally handicapped. They will provide a source of 
references and ideas for service provision both for the researcher and practitioner. 

NIGEL BEASLEY 


Experience and the Growth of Understanding. By D. W. Hamlyn. London: Routledge & Kegan Paul. 1978. 
Pp. xit+159. £6.50. 


This ıs another volume in the International Library of the Philosophy of Education, a series whose stated 
aim is ‘to build up a body of fundamental work in this area which is both practically relevant and 
philosophically competent’ There is no doubt about the philosophical competence of the author, a 
distinguished philosopher and the editor of philosophy’s premier journal, Mind. A more debatable point is the 
extent of his success in showing that his philosophical approach is of practical relevance to his chosen 
subject, namely an explanation of human learning and our acquisition of knowledge. 

Both the author’s defence of his philosophical approach and the particular solution of his problem show a 
strong Kantian influence. Thus, following Kant in holding that a philosophical question typically has the 
form ‘How 1s X possible?’ ~- so that Kant, for instance, took as his three main problems ‘How is 
mathematics possible?’, ‘How 1s science possible?’ and ‘How is metaphysics possible?’ - Professor Hamlyn 
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sees as his problem ‘How is learning and the growth of the understanding through experience possible?’. 

My suspicion, which is far from allayed by the feeling that the author seems frequently to have to ‘protest 
too much’ against this charge, is that, like Kant, he crosses backwards and forwards over the border 
between philosophy and psychology and that, just as Kant’s faith in Euclid’s geometry and Aristotle’s logic 
confused the psychologically normal with the logically necessary forms of experience and thought, so 
Hamlyn’s picture of the growth of human understanding is more empirical than his philosophical brief should 
allow. : 

The author’s particular picture of human acquisition of knowledge is also strongly Kantian. Having 
shown, quite correctly and in detail, that theories of learning are commonly either empiricist - Piaget’s ‘A 
genesis without structure’ - as, for example, Anstotle, Locke, Hayek, Watson and Skinner, or rationalist ~ 
Piaget's ‘structure without genesis’ — as for example Plato, Gestalt theory and Chomsky, he follows his 
criticisms of these alternatives by the adoption of Kant’s combination of the two — Piaget’s ‘genesis with 
structure’. Like Kant, the author goes on to work this out in, sometimes deliberately Kantian, detail for 
perception, the acquisition of concepts and the beginning of understanding and also for the origin of 
language and later adult learning. For example, he examines how far belief or knowledge is either a 
necessary or a sufficient ingredient in all perception and what rôle sensation plays, not merely in kinaesthetic 
perception, but even in vision. He discusses how far the acquisition of concepts is possible for those, e.g. 
animals or children, who either do not or cannot possess a language; to what extent all acquisition of 
knowledge is learning or vice versa; how interdependent in growth are self-knowledge and knowledge of 
other things or other people; the possession of what ideas, e.g. of truth and falsity, are necessary to the 
learning of a language. He concludes with one chapter on later learning and forms of learning other than 
those involved in intellectual development and with another — perhaps only as lip service to the educational 
nature of the series in which this volume figures -on the nature, aims and practices of teaching. 

Hamlyn’s own particular contribution throughout the book is, following Wittgenstein’s notion of a 
‘common form of life’, an emphasis on the social element in all learning and in the acquisition of such other 
human characteristics as feelings, wants, attitudes, etc. 

There is a lot to be learnt from this book, even though I am not sure how much of it is philosophy and 
how much is psychology and, therefore, how far readers will feel that it is in danger of falling between the 
stool of experimentally backed empirical work and that of analytically based conceptual clarification and, as 
a result, landing fairly and squarely on neither. 

ALAN R. WHITE 


The Psychology of Cognition. By G. Cohen. London: Academic Press. 1977. Pp. ix+241. £7.50. 


About 20 years ago the study of cognition entered a phase of revolutionary ferment which is still with us. 
Post-war developments in electronics gave new ways of recording and dealing with data. Advances in 
computer technology prompted researchers to seize upon ‘models’ and explore their applicability to people's 
performances in knowledge-handling tasks. Exuberant research branched out in all directions. Cognitive 
psychology gained a vitality it had not had since the turn of the century. Today, this explosion of research 
occupies an important sector in the long history of attempts to understand more about human cognition. 
However, it has achieved less than enthusiasts once hoped. The chief lesson to date is that no ‘model’, 
which is both precise and relatively simple, casts much unambiguous light on any reasonably large sample of 
thought and language. In the face of this hard-won lesson, many investigators have recently paused to 
ponder the cautions which Alan Newell issued in his 1973 paper ‘You can’t play twenty questions with 
nature and win’. 

Against this contemporary scene, Gillian Cohen’s book is timely, not because it comes up with any grand 
new theory of cognition, but because it aims, above all, to give investigators and advanced undergraduates a 
critical approach to research work in cognition. The emphasis is on critical evaluation. The reader is 
constantly invited to examine presuppositions and objectives, test arguments for logical consistency, weigh 
evidence for and against conclusions, scrutinize the status of questions. This makes for salutary, albeit stern 
and sometimes dull, reading. The book does not address the psychological novice, who is likely to see it as 
disputatious. Nor is the general reader wooed by its focus on theory and research. However, for the 
cognoscenti, it is highly worth while. Written in compact, admirably clear English, it provokes healthy 
reflexion and goes far to meet the objectives set out in the Preface as follows: ‘The Psychology of Cognition 
examines some central topics in the study of thinking, and tries to evaluate, and set in perspective, the 
developments of the last decade or so in these areas. ..The emphasis is on the overall patterns of evidence 
that are discernible when we try to correlate the results of different methods ~ experimental testing, computer 
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simulation, clinical studies and observations of everyday behaviour. An attempt is made to assess the 
relative power and scope of different methods, to discuss their limitations, and to show how models of 
cognitive processes gain support when the evidence from different sources converges.’ 

These objectives are pursued by dividing the book into nine chapters. Each is relatively self-contained, has 
its own bibliography, and comments cogently on some large topic, i.e. semantic memory, visual imagery, 
problem solving, the nature of language, the nature of thought, the nature of concepts, computer simulation 
and hemisphere differences. For each topic, some main theoretical issues are selected and considered with 
regard to conceptual status and empirical pros and cons. The final chapter, which might well have been 
placed first, looks at “The state of cognitive psychology: problems and panaceas’. Panaceas are neatly 
punctured, the debate on reductionism is nicely surveyed, and the final conclusion is that ‘future progress 
will be best ensured by the cultivation of greater methodological eclecticism’. Throughout, the book keeps 
its research-oriented audience steadily in view, sticks to its expressed objectives, and sustains a reflective 
commentary which is lucid, sober and devoid of padding. 

The book does not fall easily into any conventional category. Partly a textbook in cognitive psychology, 
partly a treatise in epistemology, it can perhaps be set in perspective by reverting to Newell's comments 
about playing 20 questions. In that game, you do not get far by asking about isolated possibilities. You do 
better to frame questions about ranges of possibilities and have forethought about cumulatively narrowing 
down the context of relevance. With age and experience, children progress from haphazardly gambled 
particular questions towards interrogative sequences which home in on the target. But their progress extends 
over years and is by no means linear. It incorporates many piecemeal acquisitions, and involves the 
emergence and correction of some spurious strategies, such as asking questions which have the surface 
form, but not the practical function, of constraint-seeking questions. 

How children develop skill in playing 20 questions has much in common with how researchers develop 
skill in inquiry. In both cases, progress comes by slow, tortuous routes and depends on various attainments 
which all contribute to a wider, wiser background of organized comprehension about domains of possibilities 
and about the nature of inquiry. Through her critical evaluations, Gillian Cohen helps cognitive researchers 
toward that wider, self-reflective background of comprehension on which more effective inquiry depends. 

I. M. L. HUNTER 


Growing Points in Ethology. Edited by P. P. G. Bateson & R. A. Hinde. London: Cambridge University 
Press. 1976. Pp. viii+548. £16.00 hard, £5.50 limp. 


This volume contains the proceedings of a conference held in 1975, on the 25th anniversary of the founding 
of the sub-department of animal behaviour (University of Cambridge) at Madingley. It is divided into four 
sections: (1) ‘Motivation and Perception’; (2) ‘Function and Evolution’; (3) ‘Development’; and (4) ‘Human 
Social Relationships’. Each section is preceded and followed by editorials whose function is somewhat 
redundant. 

Peter Medawar in his contribution on ‘Whether ethology throws any light on human behaviour’, makes a 
typically illuminating comment on the nature of ethology itself: ‘I think ethology has two distinctions: in 
trying to make teleonomic sense of behavioural performances that might seem to inexperienced observers to 
be a stream of inherent and functionless activities, ethologists are not yet importuned by an insistent and 
urgent need to find a casual explanation for every phenomenon they observe. Closely related to this is the 
welcome truth that ethology, unlike some psychological systems, is not yet crabbed and confined by the 
doctrinal tyranny of any pre-existing explanatory system. These two characteristics give ethology the 
freshness and spontaneity which other biologists find so enviable and which are sadly so lacking from many 
of the older and more conventional branches of zoology.’ 

Despite a bow in this direction, the two editors ın their conclusion seem to be more concerned with the 
possibility of introducing some degree of formalization. They say that the time is clearly coming when 
something more than a broad conceptual framework to order the increasing wealth of evidence will be 
needed. This is a widespread misconception in the behavioural sciences; namely to confuse heuristic with 
explanatory concepts. At the start of a discipline broad frameworks of thought are set up to encompass and 
delineate an area of research. They are often called ‘generalizations’, and so become touchstones of 
explanation and often operate unconsciously as group norms. What are required of course are abstractions 
and logical inferences concerning the structure of behaviour before depth analysis is pursued. There is still a 
great deal of conceptual confusion in the behavioural sciences and indeed in ethology itself, due mainly to 
the history of this area of investigation especially its late arrival on the scientific scene. Confusion also arises 
from the fact that all behaviour studies are based on the use of concepts used for the everyday description 
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of human behaviour, and by the tradition within biology of referring everything to strict evolutionary theory. 
Between these two extremes the problem of the level of organization and the structures of the organization 
of behaviour are neglected. 

Despite this, however, the volume contains a lot of interest, especially to the experimentors or scientists in 
closely related disciplines such as experimental psychology. Those interested in human behaviour and the 
relevance of ethological procedures of investigation to their concerns will find less of immediate interest, 
though certain contributions are stimulating. 

In the section on motivation Richard Dawkins’ discussion of hierarchical organization is explicit, very well 
illustrated and should be grasped by anyone who wishes to be acquainted with this fundamental part of 
ethology. It is not just that hierarchical organization is a powerful heuristic concept but that the working out 
of this concept in behaviour studies is likely to be uncovering the basic property of living things as manifest 
in their behaviour. 

D. J. McFarland and John C. Fentress in different ways discuss the organization of behaviour in terms of 
form, function and interactional processes. McFarland with his usual rigor defines the cost function of 
behaviour relating it to the ‘pay-offs’ in the ecological niche of the particular animal. John Fentress with a 
flair particularly his own shows how: ‘Factors which normally might be viewed as specifically relevant to one 
class of behaviour can under certain circumstances ‘activate’ another class of behaviour.’ 

It is a pity that Richard Andrew has confined his attention to attentional processes with which he has been 
particularly concerned, and has not allowed his fertile mind to roam over fields which have been opened up 
through the study of the social structure of attention. 

The second section entitled ‘Function and Evolution’ would have been better labelled ‘Social organization 
and evolution’ as in fact all four papers deal with the relationship between these two aspects of behaviour. 
Although specialized, Peter Marler’s ‘Social organisation, communication and graded signals’ is of his usual 
masterly standard and B. C. R. Betram's ‘Kin selection’ illustrated by lion communities is not only a 
working example of how the theory of evolutionary stable strategies is being applied to a particular example 
but he also shows how kin selection will be influenced by other forms of selection. 

T. H. Clutton-Brock and P. H. Harvey examine how costs and benefits of types of social behaviour 
applied to individuals can be done in such a way as to formulate hypotheses which can then be tested. By 
doing this they get away from functional explanations of variation in primate social behaviour taking the 
structure of the group as a whole as the unit. This may be so in relation to the working out of the 
individuals’ social relations within a group. To abstract features of the social groups as a structure is 
however an equally valid procedure and reveals the evolution of particular individual characteristics that are 
dependent upon particular kinds of social structure for their expression. This is the antithetical approach to 
the use of social structure as a basis for predicting the emergence of individual characteristics even though it 
says nothing about the immediate selective pressures operating on those individuals. 

Finally in this section Nick Humphrey proposes provocatively the need for laboratory test of social skill as 
a complement to the existing IQ test but there are dangers in this outlined by a symposium on Biology and 
Ethics, arranged by the Institute of Biology in 1969. 

Within the section on development ‘The place of genetics in the study of behaviour’, by Aubrey Manning 
stands out and is a great help to those who, if they can persist for long enough to finish it, will then be able to 
understand in simple creatures the nature of hereditary influences on behaviour, if not in the study of human 
behaviour. As he says: ‘If our aim is a scientific study of behaviour, we must be able to test and if necessary 
reject our everyday culture’s categorisations of behaviour. Though these are valuable sources of ideas they 
are likely to be too simple, too imprecise and insufficiently explicit. (They may even be designed to mislead, 
see Discussions below.) It is too easy to pay lip service to science and to continue on the assumption that we 
- know how people really work.’ 

To use common-sense concepts is unavoidable in the early stages of an inquiry into any domain. As we 
proceed with our analysis however it is imperative that we engage in a constant process of conceptual 
redefinition moving from metaphors with their vague but primal apprehension of order in behaviour to 
concepts which comprehend that order and describe it with increasing precision. Blurton Jones is clearly 
aware of the pitfalls which beset those who start into this area of investigation but, since it is not too much 
to say that much of psychology has been bedevilled by them, all those who are interested in this field should 
read his contribution. From what I have already said it is not surprising that he should note that ‘natural 
selection studies so far imply disappointingly little about specific mechanisms of motivation and 
development’. The reason for this is clearly the absence of concern for social structure, to which so much of 
man’s mind is adapted, and too much concern with the immediate requirement of satisfying the tenets of 
natural selection. 
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At the end of the book Niko Tinbergen imparts a sense almost of dismay in listing the social problems of 
the modern world. There is a lack of continuity here between the information imparted by ethology and the 
part ethology is supposed to contribute to solving these problems. The problem is seen as one of adjusting to 
modern social conditions through an exercise in persuasion and of adjusting the social structure to the 
requirement of people’s nature. It is seen almost as a process of instruction. Had there been more concern 
with the nature of social structure the essential bimodality of mental infrastructure would have emerged and 
the future would be seen to lie in man’s capability of being intelligent. 

The ethological contribution to the solution of our modern problems would then be clearly evident. It 
would be to impart a knowledge of mental structure which could assist individuals to upgrade their own 
intelligence. 

M. A. R. CHANCE 


Handbook of Parapsychology. Edited by Benjamin B. Wolman, New, York: Van Nostrand. 1977. Pp. 
xxit+967, £28.35. 


This text contains an enormous amount of information from all areas of parapsychology and in most respects 
is well worthy of its name. It ıs divided into 11 major sections and starts with a historical overview followed 
by sections on research methods, perception and communication, physical systems, altered states of 
consciousness, healing, survival after death, miscellaneous other fields, models and theories, Soviet research, 
and suggested readings and glossary. Each section is subdivided into chapters authored by such noted 
exponents of parapsychology as J. Beloff, J. B, Rhine, G. Schmeidler, and J. G. Pratt. 

The overall style is fairly technical though no inhibitions are evident with regard to the range of 
phenomena discussed; they are all here, telepathy, clairvoyance, psychokinesis, reincarnation and 
spiritualism, and the reader is treated to both conservative and sensational propositions from the various 
authors. 

I found Ehrenwald’s short chapter on parapsychology and the healing arts probably one of the most 
complete chapters from the point of view of considering in detail alternative explanations to the paranormal. 
Ehrenwald discusses the problems of spontaneous remission and placebo aspects in the therapeutic situation 
giving a good two-sided argument. Other authors are a little less reserved in their speculations. Thus 
Stevenson suggests that an understanding of reincarnation may supplement our present knowledge of 
heredity and environmental influences. Eisenbud spends almost a whole chapter on the amazing story of Ted 
Serios an elevator operator who under hypnosis could somehow produce pictures of unidentified places on 
unexposed film. Roll suggests that poltergeists, or recurrent spontaneous psychokinesis (RSPK) may be the 
result of a synonymity of physical and mental space, so that when an investigator enters the space 
surrounding an RSPK agent, he literally enters his mind. I think it is reasonable to say that the evidence 
presented for parapsychological phenomena is in the main presented from the point of view of the 
‘credulous’ rather than the ‘openminded’, and certainly not from the point of view of the sceptic. Problems 
of deception, distortions of memory and perception, and statistical flukes are certainly entertained, but 
generally take second place to the vast amounts of anecdotal and laboratory evidence which are proposed in 
favour of the parapsychological phenomena. 

As an indication of the credulity which pervades the book it is interesting to note that the performances 
of Un Geller are quoted on a number of occasions in the book as providing evidence for the paranormal, yet 
not one reference is made to Geller’s critics. The important recent discrediting of the work of W. J. Levy, 
Jr., with regard to psi in animals, is mentioned in a couple of lines on two occasions, but is otherwise 
conspicuous by its absence, and Hansel's classic attack on ESP hardly gets a look in. Thus I feel the book 
certainly lacks a section devoted totally to a sceptic’s view of ESP research. There are perhaps many 
people who are interested in parapsychology in terms of the implications for our understanding of more 
mundane processes who might have welcomed some chapters devoted more to other relevant issues such as 
fraud, the problems of eye-witness testimony, and the development of beliefs in parapsychology. A recent 
rather critical analysis of the psychology of the sceptic by Targ and Puthoff suggests that as well as looking 
at social psychological and personality correlates of ESP performance, it may be very fruitful to look at the 
psychological characteristics of individuals and the social situations which may determine the kinds of beliefs 
held concerning these issues. Such an analysis might have made a useful contribution to this book, assuming 
the editor could have found a suitable author. 

Nevertheless, some chapters are devoted in an impartial way to looking at various concepts as they relate 
to parapsychology, such as Tart’s chapter on altered states of consciousness, and Burdick & Kelly’s chapter 
on statistical methods in parapsychological research; anyone wishing to engage in research in ESP should find 
the latter chapter particularly helpful. I imagine that some readers might also gain a few insights into modern 


Book reviews 529 


physics from Whiteman’s chapter on parapsychology and physics, though I must admit I became hopelessly 
lost round about Quantum Field Theory. 

My overall impression 1s that the book should certainly prove of great value to anyone interested in 
parapsychology and can be highly recommended as an excellent source of references for documented 
evidence on ESP, and associated methodology. In this respect it is probably the most comprehensive text 
available. 

However, finally, I must admit that there was one other aspect of the book, apart from the price, which I 
did find rather disappointing. Confronted by this vast mass of significant results and high-powered talk such 
as ‘a six-dimensional tensor calculus of antisymmetric tensors of rank 2’ (p. 741), I still felt that some of my 
naive questions had not been satisfactorily answered. I still do not really know why people with otherwise 
dramatic precognitive capacities do not win the pools every week, why the stars of psychokinesis are not all 
resident in Monte Carlo (maybe they arel), and why poltergeists do not stop cars running over children but 
spend their time moving plant pots and wardrobes around. 

GRAHAM F. WAGSTAFF 


Perspectives in Law and Psychology. 1. The Criminal Justice System. Edited by B. D. Sales. New York: 
Plenum Press. 1977. Pp. xii+268. $19.50. 


This book is the first in a series in which contributions are to be made by lawyers and psychologists in the 
hope that they will help to make the legal system more effective and more just. Six of the contributors to the 
present volume took part in the first ‘law psychology research conference’, and their contributions to this 
book are expansions of the conference papers. The papers cover such varied topics as obscenity, 
communication to juries, insanity, prisoners’ experience and parole decision making. 

The authors take many of the assumptions about human behaviour which lie behind the criminal justice 
system and examine them critically in the light of psychological evidence. For example, Sales and his 
colleagues make a lengthy examination of the judge's instructions to jurors. Most of the United States have 
detailed forms of instruction laid down to minimize reversals of judgements on the grounds of error of 
instruction; it is argued that the form used may be impeccable to a lawyer, but it is probably 
incomprehensible to the layman. There is an extensive review of work on the effects of various aspects of 
language on ease of understanding or remembering prose, with many recommendations. Complex sentences 
should be avoided where possible, and the difficulties encountered with embedded sentences are discussed, 
although the examples used are too straightforward to make the point very forcefully. It is pointed out that 
legal jargon is couched in low frequency words and tends to carry a negative evaluation. Thus ‘to violate the 
law’ seems more serious than ‘to break the law’; this seems an unfortunate example since ‘violation’ may 
carry the additional negative connotations of its association with rape. Many of the suggestions are sensible, 
although some rest more on speculation than evidence, as is freely admitted (for example in the advice on 
avoiding monotony). Many of the issues and assumptions raised are open to empirical investigation — for 
example the state of New Jersey forbids jurors to take notes on the grounds that notes may be misleading 
and undue weight may be placed upon them. Existing research (which Sales and his colleagues ignore) deals 
almost exclusively with student subjects for whom notetaking tends to improve memory (Hartley & Davies, 
1978); this research can only lead to rather tentative conclusions about jurors but may serve as a starting 
point in the discussion. While there is less control of what the British judge says, any contributions 
psychologists could make to greater comprehension in court would clearly be important. 

Goldstein looks at the contribution of the behavioural scientist to knowledge about the effects of 
obscenity. Again, there are many areas awaiting research; some of the findings that do exist would be a 
disappointment to those wishing to impose stricter controls ~ most sex offenders appeared to have had less 
exposure to erotic materials than had the normal subjects. 

Several of the other papers deal with committing offenders to mental hospitals and with the question of 
dangerousness. It is argued that this is a legal rather than a psychological concept. Many of those committed 
serve indefinite sentences far longer than the maximum prison sentence for which they would have been 
liable for the same offence. It is argued that much tougher criteria for commitment are necessary; the person 
in a mental hospital is seen as suffering an additional stigma to that of being a criminal, and he loses even the 
limited rights of the ordinary offender of access to legal advice (the rules are rather different in Great 
Britain) and trial by his peers. It is argued that it is necessary to make a much clearer distinction between 
the parens patriae and police power functions of the state. (Unfortunately there isn't a clear definition of the 
former function; one infers that it deals with incarceration for the good of the individual rather than that of 
the state.) Although the British procedures differ from those in America, many of the arguments advanced 
are equally relevant here. 
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Toch has an interesting paper in which he discusses the development of the prison profile inventory which 
covers the prisoners’ view of such dimensions as privacy, friendship, safety and structure. It is valuable to 
have a systematic study of life in prison as seen by a cross-section of the inmates. The chapter by Carroll & 
Payne uses an attribution theory approach and examines judgement of responsibility and probability of 
change in offenders given differing information about the individual and his environment. Another study 
allowed subjects to request various pieces of information to help them make parole decisions. Subjects 
appeared to work in a hypothesis-testing manner, forming hypotheses about the causes of the offenders’ 
behaviour and requesting information to test these hypotheses. Both students and a sample of people 
responsible for parole judgements gave broadly similar results, though the attribution theory fits students 
better than experts. 

The readership of this book is likely to be rather restricted in Britain — certainly its appeal to 
undergraduates will be limited. Some of the themes covered are probably of less importance in the British 
legal system than in America. Nevertheless there are many issues which could interest lawyers and 
psychologists alike. Not only judges, but all lawyers would learn a great deal about how to communicate in 
court from the contribution of Sales and his colleagues. For the psychologist it is to be hoped that this 
chapter will form a stimulus for research to fill in the acknowledged gaps — indeed, this is true of several of 
the chapters, The overlap between the chapters on mental illness makes them perhaps less interesting, and in 
some the psychological content as opposed to the political is rather thin. 

KATHERINE MUIR 


HARTLEY, J. & Davigs, I. K. (1978). Notetaking: A 
critical review. Programmed Learning & 
Educational Technology (in press) 


Human Action and its Psychological Investigation. By Alan Gauld & John Shotter. London: Routledge & 
Kegan Paul. 1977. Pp. 1x+237. £5.50. 


The problem of the explanation of human action has rightly occupied a central place in recent philosophical 
discussion. In Human Action and Its Psychological Investigation Alan Gauld & John Shotter add to this 
discussion, taking sides with those who see human action as being susceptible to rational understanding 
rather than to scientific explanation. Since this thesis is not new interest lies in its development, their 
advance primarily consisting in a determined attempt to place a statement of their thesis in the context of 
psychological research. This is commendable, precisely because of the relevance of philosophical theses 
about action to its scientific investigation. That the present book openly discusses the implications of one 
such thesis for psychology is a sufficient reason for its publication. 

The thesis defended by Gauld & Shotter 1s that explanation in psychology is essentially hermeneutical. 
When a hermeneutical explanation of someone’s action has been given ‘the motives, purposes, principles 
from which he acted have been brought into the open, and so have his beliefs about his then situation, and 
about its bearing upon his goals, etc.’ (p. 7). To ask for an explanation of a human action is to ask for its 
meaning; and this in turn is to ask for a specification of the agent's intentions and beliefs in acting as he did. 
Such an explanation is not causal since an agent can avow his intentions and beliefs without having recourse 
to a putative generalization about how others may act in a similar context (p. 85). Nor is it teleological, the 
authors maintain, for an action’s performance is teleologically explained by its being required for some end, 
yet because of the intensionality of belief an agent may believe his action is necessary for some goal when 
this is in fact not so (p. 51). A hermeneutical approach does not require a barren claim about agent-causality 
(p. 42); though the authors surprisingly admit that when a person acts, his ‘self’ controls the movements of 
his body (pp. 49, 159, 169-179). Finally, Gauld & Shotter accept that the relation between the descriptions of 
an intention and a relevant action is non-contingent (pp. 41, 86), whilst allowing the possibility of there being 
a causal link between the events so described (p. 152). 

Mechanism is seen as a chief competitor to a hermeneutical approach. A mechanistic explanation will be 
atomistic (p. 215), reductionist (pp. 67, 215), causal (pp. 85, 87) and revisionary (p. 194). The authors reject 
mechanism using an argument whose implications for the study of machine intelligence are obvious. A 
generalized machine is one whose mode of operation can be described in terms of a machine table, in terms 
of that which can ‘specify for any given “internal” state of the machine what “output” will result from a 
given input and what internal state the machine will move to next’ (pp. 15-16). Since a mechanistic 
explanation can be represented in machine table terms, ıt follows that mechanism fails to the extent that a 
human agent can do anything that a generalized machine cannot do (p. 17). In particular, the authors argue 
that a machine could be programmed neither to acquire, nor to understand and use a stimulus-neutral 
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concept (e.g. that of game all the instances of which do not share a common set of features) since there can 
be no exhaustive specification of the features possessed by that concept or its instances (pp. 31-32); nor to 
acquire, understand nor use intelligently any concept (e.g. that of a pound note) that requires the possession 
of other concepts with affinities to stimulus-neutral concepts (pp. 35-38). 

What is curious about this argument is why the authors should think it succeeds. It assumes that all 
psychological explanations are mechanistic; which very assumption is admitted to be false (p. 17). The object 
of the argument is to refute attempts to give a mechanistic explanation of human ability; which leaves 
untouched the attempt to give a mechanistic explanation of human action. For an infant can display 
intentional behaviour, the authors admit (pp. 200-201, 211), without possessing the concepts that an adult 
may ascribe in attempting to interpret it. Moreover, even if the argument is conceded, there remains the 
problem of showing how a hermeneutical explanation can be given of the abilities to acquire, understand and 
use stimulus-neutral concepts. Such an attempt would be circular since ex hypothesi the agent (in this case 
infant) must be able to provide a specification of the intentions and beliefs operative in displaying such 
abilities - and he would need to possess such concepts to be able to do precisely that! 

What does the hermeneutical approach mean for psychology? Neurophysiologists are precluded from 
identifying neural control centres regulating purposeful behaviour (p. 65); any account of social behaviour 
that describes human interaction by analogy with the behaviour of physical events is wrong (p. 71); crucial 
experiments and concentration of empirical research on a restricted range of cases are misplaced (p. 190); 
and experimental results may be interpreted by specification of the cognition of participating subjects rather 
than by recourse to independently operating variables (p. 193). Future psychological research could profit by 
(more) conceptual analysis (pp. 86, 88); should examine actions where underlying intentions are obscure (p. 
80); or examine the development of intentionality in children (pp. 83, 200-201). 

The authors are however confusing about the status of their thesis. At times they modestly present it as 
the claim that the hermeneutical approach is a possible approach to action-explanation (pp. vii, 13); at other 
times, they commend it on methodological grounds — research that treats the hermeneutical approach 
seriously will be fruitful (pp. 202, 211); at yet other times its full metaphysical nature is revealed when it is 
stated to be the only possible type of action-explanation (pp. 78, 214). The last of these is of course extreme 
and unwarranted. Gauld & Shotter concentrate on intentional actions and ignore the many types of 
non-intentional actions; ignore the fact that an action’s being intentional is dependent on its description, so 
that a hermeneutical explanation is not possible for the action described under its non-intentional 
descriptions; and ignore the distinction between an explanandum and its explanans if it 1s sisted that an 
action’s being intentionally described necessitates a hermeneutical explanation of it — for if the intentional 
description of an action necessitates an explanation of the action in terms of the agent’s intentions and 
beliefs, that action is re-described and not explained in the scientific sense at all. 

Gauld & Shotter frequently show an enviable knowledge of the recent philosophical, and psychological, 
literature. It is therefore difficult to understand why the work of Davidson, Pears and von Wright has been 
by-passed; why better use was not made of the work of Malcolm, Melden and Wittgenstein, for example in 
the discussions of the Logical Connexion Argument or of the experience of intention; or why Taylor's 
position on teleological law is misquoted and misunderstood (p. 50). 

LESLIE SMITH 
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6. Submission of a paper implies that it has not 
been published elsewhere. The author is responsible 
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quotations, illustrations, etc., of which he does not 
own the copyright. 
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reproduced abroad without permission. To protect 
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